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ABSTRACT 

This  paper  is  concerned  with  studies  abcut  the  modified 
isopreference  method  for  rating  speech  communication  systems  in 
view  of  speech  quality.  The  concept  of  speech  quality  is  studied  by 
subjective  measurements  in  terms  of  intelligibility  and  "preference". 
Listening  experiments  using  the  forced  pair  comparison  technique 
have  been  performed  with  trained  and  untrained  groups  of  listeners. 
Various  kinds  of  speech  signals  from  different  systems  have  been 
compared  with  three  idealized  reference  signals  using  noise  in 
additive  and  multiplicative  form  as  degradation  signals.  Different 
kinds  of  tests  for  preference,  intelligibility,  rank  ordering  and  loudne 
are  reported  which  were  utilized  to  study  several  aspects  of  speech 
quality. 
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1.  THE  CONCEPT  OF  SPEECH  QUALITY 

1.  1  Introduction 

During  the  design,  development  and  testing  of  systems 
for  transmission,  reproduction  and  artificial  composition  of  speech 
signals  there  is  a  need  for  evaluation  and  for  optimization  criteria. 

In  the  past  intelligibility  has  been  utilized  as  the  main 
criterion  for  the  evaluation  of  speech  communication  systems. 

During  the  last  years  modern  speech  processing  techniques  have 
reached  a  state  of  high  perfection.  Frequently,  the  intelligibility 
of  the  output  speech  signals  of  such  systems  is  now  so  close  to 
100  %  that  intelligibility  alone  cannot  suffice  as  a  design  criterion. 

In  these  cases  one  has  to  consider  the  full  concept  of  "speech  quality" 
rather  than  the  aspect  of  intelligibility  alone. 

The  measurement  of  the  physical  properties  of  a  speech 
processing  system  and  the  combination  of  the  results  of  such 
measurements  in  order  to  form  a  basis  for  comparing  different 
such  systems  is  at  the  moment  only  hopeful  for  systems  with 
properties  close  to  those  of  linear  four-pole  systems.  Today  for 
all  more  complex  syster's  this  objective  approach  is  not  feasible. 

At  the  present  state  of  the  art  subjective  measurements  are  necessary 
to  fird  answers  to  the  central  questions  :  "How  well  does  an  average 
listener  understand  speech  signals  which  are  transmitted  or  created 
by  the  system  under  test"  and  "How  does  he  like  these  speech  signals, 
or  the  corresponding  system,  as  a  source  of  information?"  The  first 
question  can  be  answered  by  intelligibility  tests  and  the  second  by 
"preference"  tests.  While  intelligibility  tests  are  already  a  relatively 
well  known  tool,  the  additional  evaluation  of  "preference"  until  now 
remained  an  only  partially  eolved  problem.  "Preference"  teste  shall 
allow  to  express  their  sspect  of  speech  quality  in  terms  of  a  set  of 
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known  standard  reference  signals  or  in  terms  of  a  continuously 
degradable  reference  signal.  The  reduction  of  the  problem  to  only 
two  key  questions  is  an  important  constraint.  Hopefully  it  allows  to 
limit  sufficiently  the  scope  of  the  present  work,  tt  excludes  the 
complex  problems  of  speaker  recognition  and  two-way  communications. 

The  aim  of  the  present  study  is  to  extend  the  knowledge 
around  the  concept  of  speech  quality  by  the  performance  of  subjective 
measurements.  It  concentrates  on  methods  for  preference  testing 
and  on  their  evaluation.  A  sufficient  body  of  experimental  data  is 
being  collected  which  will  help  to  find  a  suitable  method  for  pre¬ 
ference  testing  and  will  show  its  possibilities  and  limitations.  If 
possible  a  standard  test  procedure  shall  be  proposed  which  allows 
to  grade  speech  signals  and  permits  meaningful  comparisons  between 
different  types  of  systems  and  for  comparisons  between  measure¬ 
ment  results  from  different  locations. 

The  scope  of  work  described  above  is  planned  to  be  covered 
by  end  of  1966.  The  present  interim  report  can  therefore  not  provide 
answers  to  all  of  the  problems  in  question.  It  describes  the  methods 
chosen  for  closer  study  and  summarizes  significant  findings  of  the 
past  year.  In  some  cases  the  collected  body  of  data  was  found  to  be 
still  too  small  and  did  not  allow  for  the  conclusive  determination  of 
typical  averages.  The  collection  and  evaluation  of  the  additionally 
required  data  may  necessitate  some  changes  of  statements  in  the 
present  report.  We  hope  that  in  the  final  report  these  changes  will 
only  be  affirmations  of  our  present  views. 

1.2  Speech  Quality 

Speech  quality  has  many  aspects.  The  degradation  or  loss  of 
"quality"  in  transmitting  speech  over  a  telephone  system  may  be  seen 
completely  different  from  the  degradations  in  a  vocoder  system  or 
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some  other  •  pe e c h  processing  device.  In  the  first  ceee  it  eeemc 
essential  to  evaluate  the  degree  to  which  eignificant  characterietice 
of  the  input  aignal  are  preserved  by  the  communication  channel. 

In  the  eecond  case  terms  like  "identification  of  the  speaker"  or  the 
"emotional  content  of  a  message"  may  lose  their  importance  and 
even  their  implication,  e.g.  one  might  device  a  transmission  system 
with  a  "synthetic  voice"  which  sounds  natural  compared  with  a  typical 
human  speaker  but  which  surpresses  all  characteristics  of  the 
original  speaker.  Speech  quality  may  be  viewed  to  be  a  combination 
of  the  different  attributes  of  speech  signals  which  have  to  be  pre¬ 
served  in  order  to  give  a  listener  the  impression  of  a  "high  fidelity" 
system.  It  should  describe  the  impression  of  an  average  listener 
when  he  compares  a  speech  signal  to  speech  patterns  stored  in  hie 
memory. 


Speech  quality  includes  various  factors  such  as  optimum 
loudness,  timbre  and  rhythmic  character,  annoyance,  a  possible 
fatigue  of  the  listener,  speaker  identifiability,  naturalness,  clarity, 
systematic  amplitude  or  time  distortions  and  many  others.  A 
quantitative  detinition  of  these  factors  is  often  not  only  difficult 
but  sometimes  next  to  impossible.  This  means  that  a  detailed  concept 
of  speech  quality  to  a  certain  extent  depends  upon  interpretation, 
at  least  as  long  as  thers  sxists  neither  a  comprehensive,  accurate, 
and  commonly  recognised  definition  nor  a  standardised  measurement 
procedure. 

Our  working-definition  of  "quality"  contains  besides 
intelligibility  only  a  parameter  called  "preference".  This  term 
shall  be  an  expression  for  the  average  attitude  of  a  listener  towards 
a  test  signal  while  comparing  it  consecutively  with  a  reference  speech 
signal  with  reproducible  characteristics.  Preference  is  thus  a  relative 
measure  of  quality. 
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Direct  measurements  of  ths  physical  properties  of  * 
system  for  speech  communication  or  speech  processing  may 
often  bo  performed  easily,  but,  as  already  mentioned,  the  res  .1  • 
cannot  always  be  related  to  the  total  subjective  impression  of  an 
average  listener.  As  quality  is  a  psychological  factor  of  speech 
communication,  it  requires,  at  least  today,  also  psychological 
measurement  techniques.  The  non-existence  of  a  single  listener 
with  "average"  properties  and  reactions  to  audio  signals  makes 
it  necessary,  .n  order  to  get  statistical  significance,  to  evaluate 
subjective  Judgements  from  a  number  01  listeners.  Unavoidably 
this  implies  a  limitation  of  the  expectable  accuracy  and  repro¬ 
ducibility  of  the  obtained  results. 

Speech  signals  with  very  different  qualities  may  be  rated 
by  simple  category  tests.  Here  the  listeners  are  classifying  the  test 
signal  into  a  limited  number  of  categories  guided  only  by  their  personal 
memory  and  judgement.  Higher  reliability  will  result  in  another 
approach  where  the  test  signal  is  presented  in  pairs  together  with 
samples  from  a  set  of  reference  signals  which  represent  the  different 
categories.  In  such  a  procedure  either  the  test  signal  or  the  reference 
signal,  or  both  may  be  variable,  and  may  exchange  their  relative 
position  in  the  presented  signal  pairs.  A  summary  of  methods  for 
assessing  subjective  factors  of  speech  signals  and  a  bibliography 
of  work  done  before  1962  is  contained  in  the  paper  of  MUNSON  and 
KARLIN  / i /.  The  paper  is  concerned  with  a  forced  pair-comparison 
technique  which  is  called  isopreference  method.  Both,  reference  signal 
and  test  signal  are  varied.  The  reference  signal  was  the  voice  of 
a  real  speaker  or  a  h if  1- tape  recording  of  a  spanker  degraded  by 
additive  random  noise.  The  results  of  this  method  are  normally 
shown  in  the  form  of  ieopreference  contours  in  a  speech  level 
versus  noise  level  diagram  (Fig.  1.  1). 
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Noise  level 

Fig.  4.1 


The  curves  enclose  the  point  or  area  N  which  represents  the  optimum 
setting  of  the  test  system  with  regards  to  the  best  adjustment  of  loud¬ 
ness  and  noice  level.  The  method  yields  a  quality  rating  in  form  of 
the  "Transmission  Preference  Level"  describing  the  isopreferen? 
setting  of  the  reference  signal  and  additionally  the  optimum  loudness 
level  for  the  output  o',  the  test  system. 

1.4  The  Proposed  Method  of  Preference  ’’’gating 

ROTHAUSER  /Zf  tried  to  duplicate  some  of  the  teats  des¬ 
cribed  by  MUNSON  tod  KARLIN.  He  had  to  find  that  the  scattering 
c f  the  test  results  was  worse  than  anticipated.  Presentation  of 
successive  test  conditions  along  an  isopreference  contour  ehowed 
suitable  results  because  the  test  persons  have  only  to  cling  to  their 
specific  criteria  for  preference  judgements*  But  the  deviations 
grow  intolerably  high  when  points  on  :n  isopreference  contour  with 
very  low  and  very  high  levels  of  the  test  signal  are  compared.  Here 


the  judgement*  become  very  inconsistent  because  most  of  the  listeners 
are  annoyed  by  the  unexpected  and  sometimes  painfully  high  levels 
of  the  second  signal.  This  means  that  the  decisions  of  the  listeners 
are  influenced  by  the  loudness  levels  of  the  previously  presented 
signal  pairs.  The  same  accomodation  effect  can  be  observed  for 
abrupt  changes  of  the  additive  noise,  which  accompanies  the  test 
signal  according  to  Fig.  1.1.  In  order  to  reduce  tht  influence  of 
these  practically  non-controlablc  conditioning  effects  the  number 
of  variables  during  a  test  run  has  been  reduced  as  far  as  possible. 
Expressed  in  terms  of  Fig.  1.  i  only  the  transmission  preference 
level  for  the  point  N  is  determined.  The  loudness  level  of  both  test 
and  reference  signal  are  kept  constant  at  a  value  equal  to  the  optimum 
loudness  of  the  special  system.  For  a  given  test  run  only  the  S/N 
ratio  of  the  reference  signal  is  varied. 


Fig.  1.2 


Fig.  1.2  shows  the  variables  in  the  modified  preference  test.  The 
modified  method  yields  not  only  a  simplification  of  preference  tests 
but  alec  a  substantial  improvement  with  regards  to  accuracy  and 
reproducibility  of  the  test  results. 


2. 
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PREFERENCE  TESTING 
2.  1  Test  Method 

2.  11  Description 

The  basic  requirements  for  a  measuring  procedure  are 
simplicity  and  reproducibility  at  different  locations.  Preference 
tests  require  only  comparisons  for  a  string  of  signal  pairs,  the 
test  signal  and  the  variable  reference  signal.  In  order  to  increase 
the  accuracy  the  reference  signal  is  presented  to  the  listeners 
immediately  before  or  after  the  test  signal.  The  tests  are  not  based 
on  a  particular  aspect  of  quality  but  on  overall  preference  with  no 
requirement  that  the  listeners  have  to  categorize  or  to  explain  the 
reasons  for  their  decisions.  The  listeners  are  not  allowed  to  be 
indifferent  in  their  decision  between  the  two  signals  of  any  pair  . 

They  have  to  express  their  preference  for  one  speech  sample  of  each 
pair  which  they  would  prefer  as  a  source  of  information.  Preference 
is  expressed  in  terms  of  a  reference  signal  the  quality  of  which  is 
continously  and  reproducibly  adjustable.  The  quality  of  the  reference 
signal  is  degraded  by  adding  a  certain  amount  of  a  distortion  signal 
to  a  hifi  speech  signal.  Now  the  preference  level  of  a  test  signal  can 
be  defined  in  terms  of  the  S/N  ratio  of  the  reference  signal  where 
50  %  of  all  listeners  favor  the  reference  signal. 

In  order  to  avoid  the  difficulties  MUNSON  and  KARLIN 
must  have  encountered  by  using  a  real  speaker  to  produce  the 
reference  signal  during  the  tests,  only  a  hifi  recording  of  such 
a  speaker  was  used  for  the  generation  of  the  reference  signal. 

Fig.  Z.  i  shows  a  simplified  blockdiagram  of  the  test  set-up. 
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Adjustable 

Degradation 


Fig.  2.1 
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The  test  signal  is  recorded  on  one  track  of  a  stereo  tape  recorder, 
amplified  and  periodically  fed  to  the  receivers  used  by  listeners. 

On  the  second  track  of  the  tape  recorder  a  hifi  speech  signal  is 
recorded.  This  signal  can  be  reproducibly  degraded  for  the  generation 
of  the  reference  signal.  The  test  signal  A  and  the  reference  signal  B 
are  presented  in  a  successive  and  repetitive  order  to  both  ears  of 
each  listener  via  earphones.  The  speech  level  of  both  signals  is 
adjusted  to  the  respective  optimum  loudness  for  the  particular  signal 
which  has  to  be  determined  also  subjectively  in  a  preparatory  session. 

A  preference  testing  session  may  consist  of  about  5  test  runs 
each  consisting  of  approximately  15  pair  comparisons.  The  test  material 
is  presented  to  the  listeners  in  repeated  signal  pairs  ordered  as  ABAB. 
Fig.  2.2  shows  the  mode  of  presentation  by  illustrating  the  time  pattern 
and  the  variation  of  the  S/N  ratio  of  the  reference  signal  B. 

\ 


Fig.  2.2 
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Tha  cross-hatching  illustrates  the  constant  amount  of  the  S/N  ratio 
of  the  reference  signal  for  one  repeated  signal  pair  and  the  random 
variation  for  a  consecutive  pair.  The  duration  of  hoth  speech  samples 
A  and  B  has  been  fixed  5  seconds  and  the  interval  between  adjacent 
signal  samples  to  0.  5  seconds.  During  a  pause  of  10  seconds  between 
each  of  the  repeated  signal  pairs,  the  listeners  have  to  make  and  to 
indicate  their  decisions,  and  the  operator  is  able  to  change  the  settings 
for  the  next  pair  comparison. 

Normally  for  the  evaluation  of  a  test  signal  a  preliminary 
test  run  is  executed  in  order  to  establish  the  approximate  value  of 
the  preference  level  and  to  determine  the  lower  and  upper  limits 
of  the  S/N  ratio  for  signal  B,  at  which  all  listeners  prefer  either 
signal  A  or  signal  B.  During  the  main  test  the  incremental  steps 
of  the  S/N  ratio  for  signal  B,  i.  e.  the  degradation  of  the  reference 
signal  are  chosen  much  smaller.  Then  they  should  be  small  enough 
to  cause  inconsistent  decisions  by  some  listeners  in  the  vicinity  of 
their  respective  preference  levels  in  order  to  get  the  highest  possible 
accuracy  in  determining  isopreference  level. 
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2.  2  Requirements 

In  spite  of  the  drastic  reduction  of  the  variables  during  a 
single  test  run,  as  compared  to  the  procedure  followed  by  MUNSON 
and  KARLIN,  there  remains  a  considerable  number  of  parameters 
which  may  influence  the  results  of  a  preference  test.  Table  2.  1  lists 
the  most  important  of  these  parameters.  Even  a  rough  estimate  shows 
that  there  are  more  than  thousand  test  conditions  which  differ  in  at 
least  one  of  these  parameters. 

It  has  been  one  of  the  first  tasks  in  the  work  reported  here 
to  select  the  most  interesting  and  important  test  conditions.  The 
specially  marked  parameters  have  been  actually  used  in  our  tests. 

The  present  study  is  only  concerned  with  continuous  speech.  It  was 
decided  to  present  both  speech  samples,  i.e.  the  test  signal  and  the 
hifi  component  of  the  reference  signal,  at  optimum  loudness  levels 
which  have  been  determined  subjectively  by  the  same  listeners  in 
previous  sessions. 

For  the  presentation  of  all  signals  headphones  were  chosen 
in  order  to  avoid  the  difficulties  which  occur  in  conjunction  with 
loudspeakers  and  acoustics.  It  may  be  possible  that  for  later 
investigations  also  loudspeaker  presentations  will  become  more 
interesting  in  conjunction  with  the  testing  of  speech  signals  under 
ambient  noise  conditions. 

2.  21  Reference  Signals 


The  selection  of  the  "best"  reference  signal  for  the  purposes 
of  preference  testing  is  not  easy  and  necessitates  some  compromises. 
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REFERENCE  SI  G  N  A  L 


TEXT  :  continuous  Q 

LOUDNESS  :  variable  adjusted 

PRODUCED  BY  :  live  transmission  system 


LOUDNESS  : 


ADDITIONAL 
DISTORTIONS  : 


words 

fixed 


syllables 
optimum  O 


idealized  system 

•  hifi  +  noise  O 

■  hifi  x  (1  +  k  noise)  O 

•  real  speaker  +  noise 


telephone  vocoder  pulsedeltamod 


TEST  SIGNAL 

TEXT  : 
LOUDNESS : 
PRODUCED  BY  : 


fixed 


syllables 
optimum  O 


continuous  O  words  syllables 

variable  adjusted  fixed  optimum  C 

live  transmission  system  idealized  system 

■  hifi  +  noise  O 
•  hifi  x  (1  +  k  noise)  O 
l  real  speaker  +  noise 

telephone  Q  vocoder  O  pulsedeltamod  Qany  new  system  Q 


MODE  OF  PRESENTATION 

TRANSDUCER  :  headphones  O  handsets  loudspeakers 


AMBIENT  NOISE  :  none  O 


produced  by  loudspeakers 

{office  noise 

typical  noises 


wide  band 


artificial 


•wiae  o 

ial  - 

•  band  s 


band  shaped 


PRESENTATION  FRAME 


LISTENING  GROUP 
SIZE  : 

TRAINING  : 


small  (8  -  10)  O  large  (>  50)  O 


trained 


untrained 


TEST  REPETITION  : 


Table  2.  1  :  Parameters  in  Preference  Testing 
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A  good  reference  signal  should  have  the  following  properties  : 

a)  The  reference  signal  should  be  variable  in  its  quality  between 
hifi  quality  and  a  not  defined  worst  value,  so  that  for  all  possible 
test  signals  there  are  corresponding  isopreferent  reference 
signals  which  are  sufficiently  inside  the  total  quality  range. 

b)  For  accurate  test  results  the  reference  signal  should  be 
similar  to  the  anticipated  test  signals  because  the  reliability 
of  judgements  on  speech  quality  is  influenced  by  the  ease  of 
comparisons  between  the  test  signal  and  the  reference  signal. 

c)  The  reference  signal  and  its  generation  should  be  exactly 
defined  and  allow  for  its  simple  and  reliable  reproduction 
in  any  laboratory. 

d)  The  quality  of  the  reference  signal  should  be  easily  measurable. 

e)  It  should  be  easily  interpretable  by  engineers. 

This  list  of  requested  properties  makes  idealized  systems 
the  most  promising  candidates  for  the  derivation  of  a  reference  signal, 
as  normal  live  systems  cannot  be  expected  to  offer  system  conditions 
as  closely  defined  and  variable  as  necessary.  Item  b)  has  been  stated 
although  there  is  no  conclusive  theoretical  way  to  define  the  variations 
and  distortions  of  possible  test  signals  which  can  be  described  in  terms 
of  a  particular  reference  signal. 

In  order  to  get  a  simple  measure  for  the  degradation  of  a 
speech  signal  as  requested  in  b)  and  c)  a  reference  signal  r(t)  can  be 
defined  as  a  hifi  speech  signal  s(t)  plus  a  certain  amount  k  of  any 
distortion  signal  d(t).  This  basic  assumption  may  be  expressed  by 
the  simple  equation 


-  14  - 


r (t)  =  i(t)  +  k.  d(t) 

The  generation  of  this  function  r(t)  is  shown  in  Fig.  2.3 


(■fc)*  stt)+U.d&) 


Fig.  2.3 

Variations  of  the  factor  k  yields  a  variable  degradation 
of  the  hifi  signal  s(t)  and  consequently  a  variatior  of  the  speech 
quality  of  r(t)  which  can  be  easily  expressed  in  terms  of  its  S/N 
ratio.  s(t)  may  be  a  real  human  speaker  or  only  a  hifi  recording  of 
such  a  speaker.  Now  the  degradation  signal  d(t)  has  to  be  chosen, 
i.e.  the  question  for  a  suitable  reference  signal  has  been  changed 
to  the  question  for  only  a  suitable  degradation  signal. 


Among  all  possible  degradation  signals  those  on  the 
basis  of  random  noise  seem  to  fit  best  the  desired  properties  a)  -  e). 
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Noise  has  several  advantages  including  that  ol  being  physically 
measurable.  White  noise  can  be  easily  shaped  by  a  suitable  weighting 
curve  to  get  a  better  approximation  to  average  speech  spectra. 

During  our  studies  we  have  utilized  different  kinds  ot  weighting 
networks  : 

u)  A -noise  :  white  noise,  the  spectrum  of  which  has  been 
shaped  by  an  A-weighting  network. 

v)  LP-noise  :  lowpass  noise  spectrum  with  a  flat  envelope 
up  to  about  500  cps  and  decay  at  a  rate  of  9  dB  per  octave 
above  that  frequency. 

w)  PINK-noise  :  noise  with  a  reduction  of  the  higher  frequencies 
by  3  dB  per  octave. 

Fig.  2.4  shows  the  three  noise  spectra 
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From  a  technical  point  of  view  A-noise  should  be  preferred. 

It  is  bandliinited  on  both  aides  and  therefore  avoids  the  problem  of 
overloading  the  transducer  with  energy  outside  the  normal  hearing 
range.  An  additional  advantage  of  the  A-curve  is  its  standardization 
by  I.  S.  O  for  acoustic  measurements.  A  filter  with  such  a  response 
can  therefore  be  assumed  to  be  easily  available  in  acoustic  laboratories. 
If  the  listeners  are  given  a  choice  between  the  different  types  of 
noises,  they  seem  to  favor  the  pink  noise,  because  it  contains  less 
energy  at  high  frequencies.  Comparative  tests  did  not  reveal  any 
significant  differences  which  would  make  the  decision  for  one  of  the 
three  types  of  noises  easier.  Results  derived  with  one  type  of  reference 
signal  can  be  compared  to  results  with  another  reference  by  merely 
adding  a  constant  which  compensates  for  the  different  spectral  shapes 
of  the  degradation  noises. 

The  actual  generation  of  the  reference  signal  is  shown  in 
Fig.  2.5.  This  reference  signal  r(t)  will  a  called  in  the  following 
"additive  reference". 


'  Acidl  itive  Reference  * 


S(0 


R.  sU)*U.n.U) 

CZD - 1 - — 


n.U) 


Fig.  2.5 
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The  addition  o£  the  noise  n^(t)  to  the  h if i  signal  s(t)  is  prooably  the 
simplest  way  to  generate  a  reference  signal  but  •'his  reference  signal 
does  not  have  all  desired  properties.  Experience  has  shown  that  it 
does  not  comply  witli  the  properties  listed  under  b)  and  d).  It  has 
been  requested  under  b)  that  reference  signal  and  test  signal  should 
be  similar.  The  additive  reference  will  only  in  a  few  practical  cases 
satisfy  this  requirement.  After  becoming  familiar  with  this  reference 
signal  most  of  the  listeners  are  able  to  separate  its  two  parts,  i.e. 
they  are  aware  that  they  hear  hifi  speech  and  simultaneously  noise. 

The  perception  of  this  effect  is  enhanced  by  the  fact  that  the  noise 
degradation  signal  is  always  present.  This  may  lead  to  difficulties 
especially  when  single  isolated  words  instead  of  continuous  texts 
are  used  as  test  material. 

Another  problem  for  the  additive  reference  is  listed  under  d). 
The  quality  cf  this  reference  signal  is  defined  in  terms  of  its  S/N 
ratio  which  can  be  written  as 

level  of  the  speech  signal  -  level  of  the  distortion  signal. 

While  the  noise  level  is  measurable  up  to  an  accuracy  of  about  0.  2  dB 
it  is  difficult  to  get  a  comprehensive  definition  of  the  speech  level. 

It  has  been  decided  to  circumvent  this  problem  for  the  moment  because 
the  determination  of  the  speech  level  with  the  desired  accuracy  turned 
out  to  be  more  difficult  than  anticipated.  All  our  speech  recordings 
carry  therefore  also  a  preceding  pilot  tone  which  allows  to  play  the 
tapes  always  at  the  same  level.  The  problem  of  the  measurement  of 
absolute  speech  levels  could  thus  be  postponed. 

Considerations  of  the  two  problems  of  dis s ; milarity  and  of 
exact  speech  level  measurements,  mentioned  above,  recommend  a 
search  for  other  types  of  reference  signals. 
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A  promising  distortion  signal  was  found  by  multiplying 
the  hifi  speech  signal  with  random  noise. 

d(t)  =  s(t)  .  n’o  (t) 

The  corresponding  reference  signal  r’  (t)  =  s(t)  +  k.d(t)  = 

s{t)  ,  [l  +  k.  n^  (t)]  will  be  referred  to  as  "Multiplicative  Reference". 

Fig.  2.6  demonstrate'’  the  main  differences  between  the 
two  reference  signals.  In  contrast  to  the  additive  degradation  signal, 
the  multiplicative  degr  a.aation  signal  ia  not  present  during  speech 
pauses,  thus  it  cannot  be  separated  from  the  hifi  speech  signal  by 
the  listeners.  Additionally,  it  overcomes  also  the  second  problem 
of  the  additive  reference,  as  it  does  not  require  exact  speech  level 
measurements. 


Reference  Signals 
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Here  the  speech  level  has  not  to  be  determined  with  high  accuracy. 
The  operation  s(t)  .  n^  (t)  causes  the  distortion  level  to  change  in 
just  the  same  way  as  the  speech  level.  Therefore  the  S/N  ratio 
of  the  multiplicative  reference  is  independent  from  the  speech  level. 

The  better  fit  of  the  multiplicative  reference  to  the  list 
of  required  gex.eral  properties  than  that  of  the  additive  reference 
reflects  itself  in  the  test  results.  On  the  average  there  are  smaller 
standard  deviations  of  the  test  results  when  preference 
tests  are  performed  with  multiplicative  noise  mainly  because  of  the 
greater  similarity  of  the  test  and  reference  signals. 

The  generation  of  the  multiplicative  reference  signal  r’  (t) 
is  shown  by  the  blockdiagram  of  Fig.  2.7.  The  multiplication  of 
speech  s(t)  and  noise  n^  (t)  is  done  by  a  Hall-Multiplier, 


^Multiplicative  Reference ' 


s(t'l 


stt) 


Fig.  2.7 


a  blockdiagram  of  which  is  shown  in  Fig.  2.  8. 


Fig.  2.8 

The  output  voltage  of  this  multiplier  can  be  expressed  by  the  equation 

UTT  =  Rtj  •  i  .  B  +  k.  .  i  +  k,  .  B 
H  H  s  1  s  2 

k^,k^  ....  constants 

. Hall-constant 

Making  the  control  current  i  proportional  to  the  speech  signal  s(t) 
and  the  magnetic  field  B  caused  by  the  field  current  i„  proportional 

r 

to  the  noise  signal  n^  (t),  one  gets  the  output 

Ujj  =  K  .  s(t)  .  n^  (t)  +  Ki  .  s(t)  +  K2  .  n^  (t) 

K.K^.K^  ....  constants 


i 

i 
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The  terms  with  and  are  deviations  from  ideal  product 
s(t)  .  (t).  They  are  caused  by  non-ideal  properties  the  so  called 

"zero  components"  of  the  Hall-multiplier.  A  compensation  of  these 
terms  by  some  special  arrangements  of  the  circuit  is  possible. 

Finally  the  resulting  accuracy  of  the  Hall  multiplier  is  about  30  dB 
relative  to  the  optimum  output  signal.  A  complete  circuit  diagram  ol 
the  generation  of  the  multiplicative  reference  is  given  in  the  Appendix. 

The  Hall-multiplier  poses  a  second  problem  besides  its 
zero  components  which  were  mentioned  above.  The  magnetic  field 
B  is  proportional  to  the  field  current  ip  which  has  to  flow  through 
the  fieldcoil  with  its  high  inductivity.  The  field  current  ip  now  should 
have  a  spectrum  according  to  that  of  the  noise  signal  and  because  of 
the  high  inductive  load,  there  are  difficulties  with  the  high  frequency 
components  of  n^  (t).  A  power  amplifier  is  necessary  as  a  current 
source  and  additionally  the  spectrum  of  the  noise  input  signal  has 
to  be  pre -emphasized  at  high  frequencies.  The  spectrum  of  the  field 
current  i^  and  with  it  of  the  noise  signal  n’  (t)  is  given  in  Fig.  2.  9. 

r  O 
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Although,  in  moat  of  our  experiments  we  have  generated 
the  multiplicative  reference  by  a  Hall-multiplier,  we  were  not  too 
well  satisfied  with  the  performance  of  this  electronic  device. 

It  was  therefore  tried  to  find  another  more  suitable  form 
of  generating  the  product  of  two  signals.  The  new  idea  was  to  interpret 
the  product  of  two  suitable  functions  as  a  controlled  switching  process 
because  a  switch  can  be  more  easily  and  accurately  implemented  than 
an  analog  multiplier.  The  desired  reference  signal  should  have  a 
noisy  character.  This  could  be  achieved,  e.g.  by  random  interruptions 
or  polarity  inversions  of  the  hifi  speech  signal.  It  was  decided  to  utilize 
the  second  method  of  random  inversion  in  periodic  intervals.  The  pulse  chair, 
with  random  character  for  control  of  the  inverter  switch  is  derived  from 
the  output  of  a  noise  generator  by  sampling  the  noise  signal  periodically 
with  a  certain  clock  frequency. 

The  properties  of  the  pulse  chair,  can  be  specified  by  the 
clock  frequency  used  and  the  probability  that  it  will  actuate  the 
inversion  switch  in  the  sampling  points.  This  probability  was  fixed 
to  be  50  %.  The  not  yet  determined  value  of  the  clock  frequency  is 
chosen,  so  that  the  intelligibility  of  the  product  signal  s(t).no  (t) 
is  as  low  as  possible. 


O  2.  A  G>  8 


Clock  -  -frequency  Ukcps} 


Fig.  2.  10 
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1  he  relation  between  intelligibility  and  clock  frequency  is  shown  in 
Fig.  2.  10  and  the  minimum  value  yields  a  corresponding  frequency 
of  about  4  kcps. 


Random  Pulse  G»ene.rcxtor 


Fig.  2.11 


The  generation  of  the  switching  function  n  (t)  is  shown  in  the  block 
diagram  of  Fig.  2.  11  and  in  the  time  table  of  Fig.  2.  13.  After  sampling 
the  noise  signal  n(t)  by  a  scanner,  the  resulting  pulses  n  (t)  are 
filtered  by  an  amplitude  filter.  The  remaining  pulses  control  a 
bistable  multivibrator  bMV,  which  generates  the  switching  function 
n*(t).  A  control  loop  including  a  servo  amplifier  is  provided  to  ensure 
the  50  %  probability  in  the  switching  points  of  n*(t). 


(4) 


#  Digital  Reference 

s(\\  r* 


The  generation  of  the  "digital  reference"  signal 

r  (t)  =  s(t)  +  k  .  s(t)  .  n*(t) 

which  will  be  referred  to  as  "Digital  Reference"  is  shown  in  Fig.  2.  12. 
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The  digital  reference  sounds  similar  to  multiplicative  reference.  As 
its  electronic  implementation  causes  less  problems  than  the  hiall 
multiplier,  it  seems  to  be  superior  to  the  latter.  More  experience 
with  this  digital  reference  is  still  needed  before  we  can  make  more 
definite  recommendations.  The  Appendix  contains  detailed  circuit 
diagrams  for  the  implementation  of  this  concept. 

2.  22 _ Te st  Signals 

Speech  test  signals  are  output  signals  ot  any  natural  or 
artificial  speech  transmission,  reproduction  or  composition  system, 
the  speech  quality  of  which  is  to  be  evaluated.  For  our  purpose  oi 
studying  the  usefulness  of  a  method  for  preference  testing  its  per¬ 
formance  with  a  large  variety  of  test  signals  should  be  evaluated. 

This  is  necessary  in  order  to  prove  the  reliability  and  justification 
of  a  procedure  which  uses  only  one  kind  of  reference  signal  for 
comparison  with  the  numerous  possible  variations  of  the  properties 
of  a  speech  signal. 

The  used  speech  test  material  consists  ot  a  continuous 
text  and  not  of  single  words  or  syllables.  The  test  signal  is  pre¬ 
sented  to  the  listeners  with  optimum  loudness  and  is  compared  with 
a  variably  degraded  reference  signal.  The  determination  of  the 
optimum  loudness  of  a  special  speech  signal  is  discussed  in 
chapter  3.  13.  For  our  preference  tests  we  used  a  set  of  teet  signals 
produced  by  three  different  kinds  of  speech  systems  as  shown  in 
Table  1.1: 

a)  LIVE  SYSTEMS  :  Natural  systems  which  are  in  a 

normal  use  as  speech  transmission 
or  processing  system. 
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al)  Telephone  : 


a2)  Vocoder  : 


ai)  Delta  Modulation  : 


b)  IDEALIZED  SYSTEMS  : 


bl)  Additive  Noise  : 
b2)  Multiplicative  Noise  : 
b3)  Digital  Noise  : 


Real  local  telephone  circuit  (Tel) 

using  a  transmission  loop  from  one 

location  over  a  dialled  connection  to 

the  PBX  and  back  to  the  same  location. 

Channel  vocoder  from  the  HIM  Laboratory 

Vienna  / 5/  in  a  special  setting: 

Fundamental  frequency  :  normal  (VON) 

110  cps  (VC1) 

200  cps  (V 02) 

Pulse  modulation  system  in  the 
following  settings  : 

Sampling  frequency  :  7.  2  kcps.  .  .  (PDl) 

20  ac  ps .  .  .  (PD 2) 
43.  2  kcps.  .  .  (PD 3) 

60  kcpft.  .  .  (PD4) 
120  kcps.  .  .  (PD5) 

Artificial  systems  which  produce 

an  output  signal  consisting  of  a  high 

fidelity  recording  of  real  speech, 

variably  distorted  by  any  form  of 

additive  or  multiplicative  noise. 

Additive  reference  signal  (ADD), 

hif i  +  k  .  noise . 

Multiplicative  reference  signal  (MULT), 
hii'i  .  (1  +  k  .  noise);  Hall  multiplier, 
Digital  reference  signal  (DIG), 
hifi  .  (1  +  k  .  noise);  random  pulse  chain. 


Because  of  the  variable  and  adjustable  degradation,  first  of  all  these 
speech  signals  are  used  as  reference  signals  and  therefore  are  des¬ 
cribed  in  detail  in  the  previous  chapter.  Conducting  preference  tests 
between  two  of  these  speech  signalr,  one  of  them  may  act  as  a  variable 
reference  signal;  the  other  one  may  be  held  constant  acting  as  the  test 
signal.  The  results  yield  the  important  relations  between  the  three 
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reference  signals  which  allow  a  first  examination  of  the  transitivity  oi 
the  proposed  method  of  preference  testing. 


c)  SIMULATED  SYSTEMS: 


cl)  Lowpass  : 


cZ)  Highpars  : 


c3)  Live  System  with 

additive  Distortion  : 


Artificial  system  the  properties  of 
wh.ch  are  simulated  by  any  conceivable 
distortion  of  speech  signals,  e.g. 
filtering,  clipping,  echo,  crosstalk  etc. 
of  a  hifi  speech  signal,  or  live  systems 
with  any  additional  artificial  distortions. 
Filtered  speech  by  a  lowpass  (LP) 
with  a  cut-off  frequency  of  1  kc.  The 
rejected  frequencies  were  attenuated 
at  a  rate  of  40  dB  per  octave. 

Filtered  speech  by  a  highpass  (HP) 
with  a  cut  -off  frequency  of  1  kc.  The 
rejected  frequencies  were  attenuated 
at  a  rate  of  40  dB  per  octave. 

Additional  artificial  distortions  allow 
to  increase  the  set  of  available  test 
signals  and  to  study  the  effects  ol 
superposed  distortions - 


2.  Z 3 _ Listening  Group 

A  statement  on  the  general  acceptability  of  a  particular  system 
with  regards  to  the  public  attitude  towards  a  particular  aspect  of 
speech  quality  has  to  be  based  upon  the  judgements  of  a  sufficient 
number  of  listeners.  In  order  to  prove  the  usefulness  of  the  proposed 
preference  method,  it  is  necessary  to  describe  the  accuracy  of  such  tests 
not  only  for  a  group  of  listeners  at  one  time,  but  also  for  the  single 
listener  in  repeated  tests  over  a  longer  time  interval.  Training  of  a 
special  listening  group  shoulu  ensure  that  the  tests  can  be  run  under 
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stable  conditions. 

At  the  beginning  of  a  test  session  the  listeners  are  informed 
about  the  purpose  of  the  test  and  the  testing  procedure.  In  order  to 
familiarize  the  test  persons  with  the  different  types  of  test  signals 
which  will  oe  encountered  during  the  following  test  every  new  speed, 
signal  is  presented  for  about  two  minutes  before  the  actual  test  starts. 

Three  different  groups  ot  lisr*  mrs  have  been  utilised  until 
now;  all  of  them  being  male  adults  between  20  and  35  years  of  age  : 

a)  A  large  group  of  untrain  d  observers.  Two  times  a  year 
about  BO  new  students  have  to  take  laboratory  exercises  in  our  institute. 
Nearly  none  01  them  have  ever  been  'xpo-e  j  to  osycho-acoustic  measun 
ments. These  listeners  therefore  are  untrained  and  perform  all  the 
desired  tests  without  test  repetitions.  The  results  of  these  groups 
should  show  the  difference  w  ^een  a  large  group  of  untrained  and  a 
small  group  of  trained  liatene  -a  and  hopefully  also  indicate  the 
"optimum"  number  of  listeners  with  regards  to  a  compromise  between 
financial  expenses  and  "stat;  tical"  significance  of  subjective  tests. 

b)  A  small  group  of  about  20  trained  persons.  This  testing  group 
was  used  for  the  coi.ection  of  most  of  the  data  contained  in  t.nis  report. 
All  listeners  of  this  group  were  examined  for  normal  hearing.  They 
meet  th-  requirements  on  auditory  acuity  m  the  American  Standard 

or.  Measurement  of  Monosyllabic  Word  Intelligibility  3/.  We  have 
round  that  these  measurements  could  be  replaced  for  our  purposes 
by  the  correct  response  to  an  intelligibility  teat  with  monosyllabic 
words.  Naturallv  wc  have  utilized  German  word  lists  for  our  students 
With  this  listening  group  not  only  the  preference  measurements  were 
conducted  but  they  helped  *iso  to  study  side  effects  such  as  training 
and  learning,  fatigue,  reproducibility  etc.  From  the  first  20  students 
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about  &  doz*n,has  carried  on  until  now,  the  others  had  to  be  replaced 
due  to  lack  of  time  or  interest.  Cur  test  room  facilities  accommodate 
a  maximum  of  10  listeners  at  one  time.  We  therefore  had  to  form  two 
groups  of  10  listaners  each  which  helped  also  to  satisfy  the  different 
working  time  schedules  of  the  listeners.  Both  groups  were  exposed 
to  practically  the  same  test  material.  The  number  of  listeners  at 
one  test  session  was  about  8.  The  duration  of  one  session  consisting 
of  about  6  test  runs,  was  two  hours  in  the  average. 

c)  A  very  email  "special  purpose"  testing  crew.  This  group 

consists  of  ourselves  and  staff  member*  of  our  institute.  Besides 
gaining  the  necessary  possibility  to  understand  from  personal 
experience  also  the  listeners’  position,  we  attacked  side  problems 
such  as  optimum  loudness,  check-on-order  effect,  difference. -limens 
and  "critical  ranges". 

The  question  when  a  single  listener  or  a  group  may  be 
qualified  as  "trained"  for  preference  tests  is  not  easily  answerable 
and  will  be  discussed  in  Sec.  3. 

2.  3  Test  Set-up 

^•31 _ Test  Environment 

A  room  for  psychological  meac”rements  should  br  reasonably 
free  of  inside  and  extraneous  noise.  Therefore  it  is  practical  to  provide 
different  rooms  for  the  listeners  and  for  the  operator  with  hxs  equipment. 

A*  only  headphones  for  the  presentation  of  the  acoustic  stimuli 
were  employed  there  is  no  necessity  for  the  installation  of  an  anechoic 
chamber.  Furthermore  the  utilized  KOSS-PRO-4  headsets  have  soft 
earcushions  for  additional  protection  against  ambient  noise.  A  te.*t 
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room  with  studio  character  having  a  small  reverberation  is  therefore 
adequate,  A  quiet  groundfloor  laboratory  room  has  been  modified  for 
our  purposes.  The  listener  room  of  approximately  23  x  9  x  10  ft  in 
size  has  the  walls  covered  with  perforated  boxlike  aluminium  sheets. 

A  lightweight  blanket  of  glasswool  is  laid  behind  them  to  provide 
sound  absorption.  The  ceiling  consists  of  suspended  broadband-absorber 
units  of  a  stvropor-like  foam  material.  For  a  reduction  of  the  noise  level 
inside  the  room  caused  by  extraneous  disturbances  the  entrance  has 
been  improved  by  installation  of  a  second  sound  insulating  door.  A 
measurement  of  the  reverberation  time  yielded  an  average  value  of 
0, 22  s  for  the  empty  room  and  a  value  of  0,  2o  s  for  the  room  v/hen 
occupied  by  the  listeners,  A  photo  in  the  Appendix  shows  the  interior 
of  the  room. 

2-  32 _ Signaling  System 

Psychological  testing  is  very  time-consuming  and  it  is 
desirable  therefore  to  conduct  the  tests  as  efficiently  as  possible. 

In  the  listening  room  accomodations  for  10  listeners  have  been 
installed.  As  the  operator  and  the  test  equipment  are  located  in  a 
separated  adjacent  room,  besides  all  necessary  equipment  and 
connections  for  the  presentation  of  speech  signals,  also  a  signaling 
system  had  to  be  provided.  It  has  been  designed  and  built  with  an  aim 
towards  automatic  operation  and  for  minimizing  the  possibilities  for 
mutual  influence  among  the  listeners.  For  storage  and  recording  ttie 
test  results  are  not  only  displayed  on  a  lamp  panel,  but  can  also  be 
printed  by  an  automatically  controlled  teletypewriter.  An  inter¬ 
communication  link  allows  for  conversation  between  the  listeners  ana 
ope  rator. 

Fig.  2.  14  shows  a  block  diagram  of  the  signaling  system. 
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Fig.  2.  14 


In  the  listener  room  there  are  ten  terminals  with  listener  sets 
consisting  of  earphones  and  the  "listener  boxes".  On  each  of  the  boxes 
three  small  lamps  "A",  "B",  and  "READY"  are  mounted  which  belong 
to  three  corresponding  push  buttons  "A",  "B",  and  "STOP".  During  the 


test  run  the  signal  lamps  A  and  B  are  controlled  by  the  program  switch 
and  indicate  the  corresponding  speech  signals  as  they  are  presented. 

a  i*  ,  , 

pr- jur.taticn  of  a  repeated  signal  pair,  each  listener  indicates 
his  dec  sion  by  pushing  the  corresponding  button  A  or  B.  Now  the 
"R&ADY"  lamp  serves  as  indicator  for  the  listener  that  he  has  taken 


his  decision.  After  a  wrong  decision  or  for  some  other  reasons  each 
listener  can  stop  the  test  run  by  pushing  the  "STOP"  button. 


In  the  operator  room  a  light  display  and  control  unit  stores 
the  listener’s  decisions  in  a  bank  o £  relays.  The  visual  indication 
is  done  by  corresponding  lamps.  After  all  decisions  h«ve  been  receivoo 
a  scanner  automatically  reads  the  results  into  an  electric  teletypewriter. 
Then  ail  lamps  are  reset  and  a  starter  impulse  to  the  program  switch 
starts  a  new  test  run. 

A  block  diagram  of  the  signaling  system  betw'een  parts  oi 
the  test  equipment  and  the  principal  connections  between  iisiener 

and  operator  room  can  be  found  in  the  Appendix. 

\ 

2.  33  Test  Equipment 

A  block  diagram  of  the  test  set-up  for  conduc.ing  pair- 
comparison  tests  is  shown  in  Fig.  2.15.  This  set-up  is  utilized  for 
our  experiments.  Two  channels  are  provided  for  two  dilferent  audio 
signals,  for  the  present  study  mostly  speech  signals.  Controlled  by 
one  noise  generator  two  separate  distortion  signals  can  be  generated 
which  allow  for  independent  degradation  of  the  signals  in  the  two  main 
channels.  A  program  switch  controls  the  presentation  of  the  acoustic 
stimuli  to  the  listeners  who  can  then  make  the  desired  observations 
with  regard  td  any  special  property  of  the  samples,  e.g.  preference, 
loudness  etc. 
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H if i  Speech  Signal 

In  contrast  to  the  experiments  reported  by  MUNSON  and  KARLIN 
we  did  not  use  a  real  speaker  for  presentation  of  speech  signals  to  the 
listeners.  A  hifi  tape  recording  avoids  all  the  difficulties  which  arise 
when  running  subjective  tests  with  a  real  speaker.  Among  others  the 
advantages  of  a  high  fidelity  tape  recording  are  the  unlimited  re¬ 
producibility  of  the  speech  signal  with  invariable  articulation  and 
reproducible  loudness  level  so  that  the  tests  may  be  repeated  with 
identical  signals  as  often  as  desired.  The  utilization  of  a  real  speaker 
would  create  another  serious  problem.  Obviously,  his  utterances  could 
not  be  presented  over  headphones.  The  whole  mode  of  presentation  would 
have  to  be  changed.  In  contrast  to  these  problems  with  a  real  speaker, 
the  main  disadvantage^ape  recordings  is  their  limited  quality.  As  we 
are  now  only  interested  in  the  evaluation  of  test  signals  which  have 
"telephone  quality",  our  hifi  recordings  are  still  by  far  superior  in 
quality.  Therefore  changes  in  the  quality  ratings  we  find  are  not  to  be 
expected,  if  a  real  speaker  instead  of  tape  recordings  were  used  during 
the  test. 


The  speech  material  used  in  our  tests  is  taken  from  a  master 
tape  which  has  been  prepared  previously  at  the  IBM  Laboratory  Vienna. 

A  professional  radiospeaker  has  read  public  news  under  studio  conditions. 
The  recordings  are  made  by  means  of  a  dynamic  AKG  microphone  type 
D  20  B  and  an  Ampex  351  tape  recorder.  The  frequency  range  of  the 
recordings  is  better  than  -  3  dB  from  50  to  15.000  cps.  The  ambient 
noise  conditions  during  the  recordings  yielded  a  S/N  ratio  on  the 
master  tape  of  about  50  dB. 

On  the  master  tape  there  is  additionally  a  1  000  cps  pilot  tone 
as  reference  tor  the  purpose  of  level  measurements.  Besides  the 
continuous  text,  400  monosyllabic  German  words  have  been  recorded 
under  the  same  conditions  to  be  used  for  intelligibility  testing. 
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The  tapes  actually  used  for  testing  are  prepared  m  the  following 
manner.  A  re-recording  of  the  master  tape  from  AMPEX  to  the  first 
track  of  the  test  tape  has  been  made  using  a  REVOX  G  36  tape  recorder. 
Connecting  the  output  from  the  AMPEX  with  the  input  ol  the  system  to 
be  evaluated,  the  system  output  signal  then  is  recorded  on  the  second 
track  of  the  test  tape  preferably  text- synchronous  to  the  text  on  track  1. 
Each  recording  is  preceded  again  by  a  pilot  tone  for  convenient  adjustment 
of  the  speech  level.  After  recording  the  400  single  words  having  passed 
the  system  to  be  tested,  the  preparation  of  the  test  tape  is  finished.  The 
content  of  the  test  tape  is  shown  in  Fig.  2.  16 
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Fig.  2.  16 


Accurate  level  measurements  of  speech  signals  are  a  well  known  problem. 
We  could  circumvent  this  problem  for  the  work  reported  here.  As  we 
wanted  to  study  mainly  the  test  procedure  and  the  reproducibility  of  tne 
listeners'  decisions  there  was  no  need  for  the  absolute  speech  level 
measurements  which  are  mandatory  for  absolute  quality  ratings  with 
the  additive  reference.  It  was  ^niy  necessary  to  keep  the  speech  level 
as  constant  as  possible  during  the  recording  sessions  and  to  take  care 
for  reproducibility  during  the  reproduction  of  speech  material.  As 
already  mentioned  a  pilot  tone  was  used  for  initial  level  adjustment 
and  a  graphic  level  recorder  for  continuous  monitoring  of  the  speech 
level.  Obviously,  difficulties  may  arise,  when  the  original  speech 
material,  i.e.  the  master  tape  is  changed.  Instead  of  establishing 
electrical  reference  conditions  it  was  decided  to  refer  always  to  the 
acoustical  input  to  the  ears  of  the  listeners. 
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When  the  speech  samples  are  presented  to  the  listeners  via 
earphones  the  sound  pressure  level  of  the  speech  signal  is  measur  ‘d 
objectively  with  an  artificial  ear.  Considering  the  well-known  difficulties 
with  the  standardized  artificia.1  ear  for  higher  frequency  measuremcnr.', 
and  its  proper  acoustic  coupling  to  headphones  with  earcushior.s  like  our 
KOSS  PRO-4,  a  simplified  construction  was  uued  which  gives  at  least 
reproducible  results.  Fig.  2.  17  shows  a  sketch  of  our  arrangement 
of  a  wooden  plate  with  an  inserted  condenser  microphone.  The  earphone 
is  pressed  to  the  plate  with  about  1  000  g.  The  air  volume  remaining 
between  the  plate  and  the  membranes  is  about  12  cm^. 


Fig.  2.  17 


The  chosen  procedure  for  speech  level  measurements  tries  to  follou 
a  practice  for  the  measurement  of  certain  impulsive  noises.  There  the 
level  is  defined  as  the  arithmetic  average  of  the  maximum  values  of  the 
A-weighted  sound  levels.  This  average  is  determined  by  considering 
only  those  maximum  values  which  are  within  10  dB  of  the  highest  occurring 
value.  The  average  has  to  be  taken  over  a  suitable  period  which  may 
coincide  in  our  case  with  the  duration  of  the  speech  sample  to  be  measured. 


A  subjective  method  for  the  measurement  of  speech  level 
especially  with  respect  to  the  actually  used  S/N  ratio  will  be  discussed 
in  Sec .  3.13. 

d.  4 _ Analysis  of  Test  Data 

In  preference  tests  listeners  have  to  decide  whether  they 
prefer  the  test  signal  A  to  reference  signal  B  or  vice  versa.  As 
listeners  are  forced  to  decide  either  for  signal  A  or  for  signal  B  the 
test  data  form  a  complete  sample  space  with  the  reference  signal 
given  by  its  S/N  ratio  figures  as  a  parameter.  Data  collected  in 
several  separate  tests  can  be  treated  together  for  evaluation  as 
other  statistical  material. 

The  following  considerations  may  form  a  basis  for  improvements 
and  refinements  of  the  test  procedure  and  test  evaluation  as  they  are 
performed  now.  In  the  next  subsections  several  theorems  and  relations 
known  from  mathematical  statistics  will  be  used  without  giving  any 
proofs.  The  interested  reader  should  refer  to  the  pertinent  literature. 

Our  considerations  shall  only  give  some  suggestions  for  handling  the 
data  obtained  by  preference  tests. 

Although  the  actual  tests  are  run  with  groups  of  listeners, 
first  of  all  it  is  useful  to  consider  the  decisions  of  an  individual  listener 
under  te6t.  In  a  second  subsection  groups  of  listeners  will  be  considered 
and  finally  our  present  method  will  be  described  which -is  utilized  for 
the  processing  of  test  data. 

Z.41  Basic  Static li(_«*l  Considerations 


The  idealized  individual  listener 

For  the  following  considerations  we  define  the  idealized 
indiviuual  listener  as  a  person  who  decides  in  a  manner  that  the 
relative  frequency  of  preferring  signal  B  to  signal  A  converges  to  the 


. 
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corresponding  probability  for  any  S/N  ratio  of  signal  14.  Further  we 
assume  that  the  probability  of  preferring  signal  B  to  signal  A 
does  not  decrease  when  the  S/N  ratio  of  B  increases.  This  fact  is 
shown  in  Fig.  2.  18  where  p^  denotes  the  probability  that  signal  B  is 
preferred  to  signal  A  at  a  S/N  ratio  of  x  dB  for  signal  R.  The 
function  verse  the  S/N  ratio  x  will  be  called  "graph  of  preferenc« 
for  abbreviation. 
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Our  assumption  implies  a  stationary  behaviour  of  the  idealized 
listener ,  i.  e .  he  should  have  a  fixed  opinion  about  the  quality  ot  the 
test  signal.  But  this  assumption  does  not  pay  regards  to  any  effect 
of  learning,  training,  accustoming,  fatigue  etc.  which  may  occur 
when  tests  are  performed  over  a  long  period  of  time.  Our  results 
6how  that  a  real  individual  listener  is  in  good  agreement  with  the 
ideal  listener  postulated  above  within  a  test  period  of  several  hours 
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In  Fig.  2.  18  the  abscissa  of  the  graph  of  preference  can  ne 

divided  into  three  intervals  :  in  the  interval  on  the  left  hand  side 
signal  A  is  always  preferred  to  signal  Q  so  that  decisions  within  this 
interval  will  have  a  high  amount  of  certainty.  Similarly  signal  B  will 
be  preferred  with  a  high  amount  of  certainty  when  its  S/N  ratio  is 
located  on  the  right  hand  side  of  the  abscissa.  For  S/N  ratios  of 
signal  B  within  the  middle  part  of  the  abscissa  decisions  of  our  listener 
will  contain  a  random  element, but  in  p^  of  a  large  number  of  identical 
comparisons  at  a  certain  S/X  ratio  x  signal  B  will  be  preferred  to 
signal  A.  Within  this  middle  part  of  the  abscissa  there  also  exists 
a  special  S/X  ratio  at  which  the  individual  listener  will  find  Doth 
signals  A  and  B  "isoprefe  rent" ,  that  is  when  the  probability  of 
preferring  B  to  A  is  just  equal  to  50  %. 


For  the  moment  we  are  mainly  interested  in  signals  B  which 
are  located  in  that  middle  part  of  the  abscissa  mentioned  above 
where  dicisions  of  the  idealized  listener  change  between  preferring 
signal  b  to  signal  A  and  vice  versa  with  a  frequency  indicated  by  the 
graph  of  preference  shown  in  Fig.  2.  lb.  Suppose  a  test  is  run  pre¬ 
senting  n  times  the  S/X  ratio  of  x  d3  to  our  idealized  individual 
listener  under  the  same  circumstances.  This  test  then  forms  a 
Bernoulli  test  consisting  in  a  succession  of  n  Bernou'h  trials  which 
means  each  trial  is  performed  under  identical  premises  with  probability 
p  for  preferring  signal  B  and  (1  -  p  ,)  for  preferring  signal  A  since  A  figures 
for  the  contradictory  statistical  event. 


For  n  Bernoulli  trials  the  numoer  S  of  preferring  signal  B 

n 

to  signal  A  is  a  random  variable  which  is  binomially  distributed  /  6/ . 
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This  equation  specifies  the  probability  of  S  equal  to  k  decisions 

f  n 

preferring  signal  B  at  n  Bernoulli  trials  with  the  probability  p  for 
preference’.  In  this  equation  p  figures  as  a  parameter.  For  a  special 
example  the  probability  P{Sn  =  k)  is  shown  in  Fig.  2.  19. 


Fig .  2.19 


As  k  goes  from  0  to  n  F5 (S^  =  k)  first  increases  monotonically  reaching 

its  greatest  value  for  k  ~  entier  (n  4  l)p  and  then  decreases  monotonicall 
Further  typical  data  for  the  distribution  of  P(Sn  =  k)  are  the  moments  o 
the  distribution.  For  the  binomial  distribution  P(S^  =  k)  the  expectation, 
or  the  first  moment  is  given  by 


n 


E(P(Sn  =  k))  =  n.px 


(2.  2’ 


and  the  standard  deviation  or  the  positive  square  root  of  the  second 
moment  with  respect  to  the  expectation  is  given  by 

c r(P(Sn  =  k))  =  f n  .  px(l  -  px)  (2.3) 

-4 

Both  values  are  better  considered  in  proportion  to  the  n  trials  of  which 
the  assumed  Bernoulli  test  consists.  The  expectation  divided  by  the 
number  of  Bernoulli  trials  is  equal  to  the  probability  p  of  preferring 
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signal  R.  At  the  same  time  the  standard  deviation  divided  nv  the 
number  of  Bernoulli  trials  becomes  inversely  proportion  a  j  :<, 
which  means  that  for  decreasing  the  standard  deviation  by  a  factor 
c  the  number  of  Bernoulli  trials  must  be  increased  by  a  fact  cm  .  ‘ 

Now  Bernoulli  tests  shall  l  •  utilized  to  evaluate  tne  prana  ..iiity 
p^  of  preferring  signal  B  to  signal  A  at  the  S/N  ratio  x  db.  From 
the  relative  standard  deviation  of  the  binomial  distribution 

<r(P(Sn  -  V))  }jpx(i  -  px)  ,,  . 

...  - - - -  ■  -  .  (  Z  .  *i 

n 

v 

follows  a  constant  value  cf  ~  at  different  probabilities  the  r.'-.n.  i j ^  r 

of  Bernoulli  trials  may  be  reduced  as  approaches  0  or  1  .  Accordingly 

the  number  n  of  Bernoulli  trials  should  be  a  maximum  for  a  desired 

accuracy  cf  p  when  p  =  o,  5. 
x  x 

Furtheron  an  estimate  can  be  given  for  the  reliability  of 
results  obtained  in  a  test  consisting  of  n  Bernoulli  trials  by  Laplace’s 
limit  theorem  which  holds  for  a  large  number  of  trials  /?/. 


42 


For  det  rmining  a  graph  of  preference  for  the  individual 
listener  as  it  is  shown  in  Fig.  2.  18  one  should  run  Bernoulli  tests 
with  several  different  S/N  ratios  cf  signal  B.  Since  we  are  mainly 
interested  in  the  S/N  ratio  of  isopreference,  i.e.  the  S/N  ratio  where 
the  graph  of  preference  crosses  50  %  ,  it  is  possible  to  shorten  the 
test  procedure  by  the  following  considerations.  Several  Bernoulli 
tests  will  be  performed  to  find  the  two  ranges  of  S/N  ratios  where 
the  individual  listeners  prefer  unambiguously  A  or  B.  One  finds 
these  ranges  in  Fig.  2.  IS  partly  on  the  left  side  and  partly  on  the  right 
side  of  the  abscissa.  It  is  obvious  that  under  these  circumstances 
the  listener  will  decide  mere  or  less  without  fail.  As  for  these  ranges 
of  S/N  ratios  the  probability  of  preferring  signal  3  indicated  by 
is  close  to  0  or  1,  results  with  small  deviations  can  be  obtained 
already  by  a  small  number  of  test  runs. 


Let  u s  assume  that  a  S/N  ratio  specifies  the  point  of 
isopreference  where  for  m  Bernoulli  trials,  the  individual  listener 
votes  rxi/Z  times  for  signal  B.  By  this  we  can  find  a  distribution 
fun  jn  Q  and  a  density  function  q  for  the  S/N  ratio  of  isopreference 
with  regard  to  ;  >  Bernoulli  trials  fr  ;m  the  graph  of  preference  of  the 
individual  listener  (fig.  2.20). 


Fig.  2. 20 
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A  derivation  oi  the  distribution  and  density  functions  from  the  graph 
of  preference  shall  not  be  given  here,  but  it  may  be  accepted  that 
these  functions  can  be  replaced  approximately  by  normal  distributions 
with  the  paramete r syu,  and  <r4  .  The  subscript  i  denotes  the  individual 
listener.  Obviously,  the  standard  deviation  C"  of  this  distribution 
depends  on  the  number  of  Bernoulli  trials  considered.  For  an  infinite 
number  of  Bernoulli  trials  the  distribution  function  degenerates  into 
a  step  function  and  the  standard  deviation  becomes  zero.  In  this  case 
the  S/N  ratio  of  isopreference  for  the  individual  listener  will  be  fully 
dete  rmined. 

By  taking  advantage  of  these  considerations  we  can  find  the 
S/N  ratio  of  isopreference  for  an  individual  listener  when  we  perform 
several  Bernoulli  tests  each  consisting  of  m  trials  at  S/N  ratios  where 
the  decisions  of  the  listeners  under  test  are  fairly  unambiguous.  As  it 
is  shown  above  the  decisions  of  our  listener  will  not  vary  very  much 
at  those  S/N  ratios  and  they  will  be  therefore  very  certain.  Thus 
we  can  get  the  left  part  and  the  right  part  of  the  distribution  function 
of  Fig.  2.  20  without  fail  and  can  find  now  the  middle  part  of  it  by 
interpolation.  The  S/N  ratio  where  this  distribution  function  crosses 
the  50  %  ordinate  will  be  a  good  estimate  for  the  desired  S/N  ratio  of 
iscpreference  for  the  individual  listener. 

For  abbreviation  we  will  speak  further  from  a  distribution 
of  the  S/N  ratio  of  isopreference  omitting  to  emphasize  the  number 
of  Bernoulli  trials  necessary  for  it.  By  introducing  the  distribution 
of  the  S/N  ratio  of  isopreference  one  may  reduce  the  number  of  tests 
which  are  necessary  for  determining  the  S/N  ratio  of  isopreference . 

This  leads  to  a  significant  reduction  of  the  necessary  effort  ?’ 
preference  testing.  The  exclusion  of  the  transition  region  for  the  taking 
of  sampling  points  wiL  reduce  the  accuracy  of  the  test  results,  but  it  will 
still  be  comparable  to  the  accuracy  limited  by  the  technical  facilities 
of  the  test  set-up. 


A  group  of  listeners 


The  considerations  concerning  the  individual  listener  shall 
be  extended  to  a  group  of  listeners.  We  take  the  group  of  listeners 
to  be  random  samples  from  a  population  of  listeners  and  their 
individual  S/N  ratios  of  isopreference  as  the  statistical  variable 
which  is  assumed  to  be  normally  distributed.  The  parameters  ot  this 
distribution  are  called  for  the  mean  and  for  the  standard 
deviation.  The  subscript  g  refers  to  the  group  of  listeners.  The 
mean  of  this  normal  distribution  indicates  that  50  %  of  the  listeners 
will  have  their  S/N  of  isopreference  lower  than  ^i3dB  and  therefore 
50  of  the  listeners  will  prefer  the  reference  signal  B  to  the  test 
signal  A  when  the  reference  is  presented  with  a  S/N  ratio  of  ^ugdB. 
For  the  measurement  of  speech  quality  we  are  more  interested  in 
the  S/N  ratio  of  isopreference yu3  for  the  group  of  listeners  than  in 
the  decisions  of  a  single  listener.  The  standard  deviation  Ogof  the 
normal  distribution  assumed  is  a  measure  for  differences  at  the 
S/N  ratios  of  isopreference  for  the  individual  listeners  and  may  be 
of  interest  as  far  as  the  reliability  of  the  test  results  is  concerned. 


The  determination  of  the  S/N  ratio  of  isopreference  for  a 
group  of  listeners  should  start  with  isopreferent  S/N  ratios  of  the 
individual  listeners.  By  plotting  the  percentage  of  those  listeners 
whose  S/N  ratios  of  isopreference  is  lower  than  x  dB  versus  the  S/N 
ratio  x.  the  experimental  distribution  function  will  be  a  staircase 
function  as  shown  in  Fig.  2.21 


•/oof 
listen  era 


(Q) 


Fig.  2.21 
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This  experimental  distribution  function  approximates  the 
assumed  normal  distribution  function  with  the  parameters yUgandOg. 

This  way  is  very  cumbersome  because  at  first  all  S/N  ratios  of 
isopreference  for  the  individual  listeners  have  to  be  calculated 
only  then  the  desired  S/N  ratio  of  isopreference  for  the  total  group 
may  be  estimated.  Therefore  another  method  shall  be  described  for 
the  evaluation  of  the  S/N  ratio  JJgOf  isopreference  for  the  group 
which  is  not  as  "accurate"  as  the  method  mentioned  above,  but 
which  is  much  easier  to  carry  out  and  which  still  yields  sufficient 
accuracy. 

A  simplified  case  shall  be  considered  first  :  we  assume  graphs 
of  preference  for  all  individual  listeners  in  form  of  simple  step  functions. 
Of  course  that  is  only  a  rough  approximation  to  reality,  yet  it  is 
very  helpful  for  introducing  the  following  method  into  the  present 
concept.  With  this  assumption  it  follows  that  at  any  S/N  ratio  of  the 
reference  each  listener  votes  without  fail  either  for  signal  B  of  for 
signal  A  respectively  at  any  number  of  trials  performed.  Therefore 
one  can  take  a  test  procedure  in  which  each  S/N  ratio  of  the  reference 
is  just  once  presented  to  the  group  of  listeners,  and  one  then  collects 
their  decisions.  This  simple  procedure  yields  here  already  the 
experimental  distribution  function  of  Fig.  2.  21. 

If  one  goes  back  to  the  real  test  conditions,  graphs  of  pre¬ 
ference  for  the  individual  listeners  may  look  like  Fig.  2.  18.  Running 
the  same  test  procedure  described  just  above  we  will  have  several 
listeners  in  the  group  voting  not  in  accordance  with  their  S/N  ratios 
of  isopreference.  Plotting  again  the  percentage  of  listeners  preferring 
signal  B  versus  the  corresponding  S/N  ratio  one  finds  that  this  empirical 
distribution  function  is  not  necessarily  a  monotonical  increasing  function 
as  it  should  be.  This  is  caused  by  the  "fail"  votes  of  listeners  (Fig.  2.22). 
Still  we  may  approximate  this  empirical  distribution  function  by  a.  normal 
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distribution  the  parameters  of  which  are  called  yu  and  <r  .  It  is  reasonable 
to  take  as  well  the  mean  valueyu  obtained  in  this  manner  as  an 
approximate  S/N  ratio  of  isopreference  for  the  group  of  listeners 
under  test.  The  mean  value yu.  will  not  differ  very  much  from  the 
mean  .value  yu^  discussed  above.  But  the  standard  deviation  CT  obtained 
now  will  be  quite  different  to  C"s  defined  above.  It  will  be  the  aim  of 
the  following  paragraph  to  give  the  relation  between  these  two  standard 
deviations. 

In  the  preceding  section  we  have  spoken  about  a  distribution 
for  the  S/N  ratios  of  isopreference  evaluable  for  individual  listeners 
provided  a  certain  number  of  trials  has  been  performed.  In  this 
connexion  we  shall  accept  normal  distributions  for  these  S/N  ratios  of 
isopreference  with  equal  standard  deviations  O';  for  each  individual 
listener.  The  mean  values  of  these  distributions  may  differ  of  course 
corresponding  to  the  distribution  of  the  S/N  ratios  of  isopreference 
concerning  the  population  of  listeners  assumed  previously.  Based 
on  these  premises  it  can  be  stated  that  the  standard  deviation  CT 
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obtained  by  evaluating  the  S/N  ratio  of  isopreference  directly  from  the 
votes  of  listeners  under  test  will  be  larger  than  the  standard  deviation 
Oq obtained  by  evaluating  the  S/N  ratios  of  isopreference  of  the 
individual  listeners.  The  following  equation  holds  /6/  : 

O'  =  V  Q-g  (2.6) 

The  increase  of  the  standard  deviation  by  evaluating  the  total  votes 
of  listeners  at  each  S/N  ratio  is  obvious,  if  one  considers  that  certain 
"fail"  votes  of  listeners  can  be  eliminated  by  pre -evaluating  S/N  ratios 
of  isopreference  for  single  listeners. 

In  this  section  we  have  given  two  principle  methods  for 
evaluating  the  data  collected  in  preference  tests.  Both  methods  lead 
to  more  or  less  the  same  S/N  ratios  of  isopreference  for  groups  of 
listeners,  whereas  the  standard  deviations  for  the  approximating 
normal  distributions  are  different.  Since  we  are  mainly  interested 
in  the  S/N  ratio  of  isopreference  for  groups  of  listeners  we  may  take 
advantage  of  the  method  which  requires  less  effort. 

2. 42 _ Processing  of  Test  Data 

In  the  course  of  preference  tests  reference  signals  B  having 
several  S/N  ratios  are  presented  to  a  group  of  listeners  whc  decide 
at  each  S/N  ratio  whether  they  prefer  the  reference  signal  B  to  the 
test  signal  A.  The  votes  of  the  listeners  under  test  are  the  collected 
data. 


At  first  the  percentage  of  votes  is  calculated  for  B  at  each 
S/N  ratio  of  the  reference.  This  percentage  will  be  taken  with  respect 
to  the  total  number  of  listeners  in  the  group  when  the  S/N  ratio  of 
isopreference  is  to  be  evaluated  for  the  group.  But  the  percentage 
will  be  taken  only  with  respect  to  the  number  of  presentations  when 


-  48  - 


S/N  ratios  of  isopreference  are  to  be  evaluated  for  individual 
listeners. 


We  assume  these  percentages  just  mentioned  above  to  be 
approximate  values  of  the  distribution  of  the  probability  that  a  S/N 
ratio  of  isopreference  is  below  the  S/N  ratio  just  considered.  We  take 
for  granted  that  all  considered  sets  of  S/N  ratios  of  isopreference  shall 
be  normally  distributed.  The  percentages  of  votes  for  signal  B  which  we 
obtain  in  preference  tests  in  dependence  on  several  S/N  ratios  oi  re¬ 
ference  signals  will  come  close  to  the  assumed  distribution  function. 

We  may  approximate  the  latter  function  by  plotting  a  smooth  curve 
between  the  points  given  by  the  percentages  of  votes  versus  the  S/N 
ratios  of  the  reference  signals.  (Fig.  2.23).  This  smooth  curve  shall 
XjC  calculated  as  a  normal  distribution  function  with  the  mean  value ^ 
and  the  standard  deviation  <r 


(2.7) 


Lateron,  wherever  necessary  we  shall  distinguish  by  subscripts  to 
the  parameters  ya  and  <T  between  approximations  to  different  distributions. 
There  are  the  distributions  of  the  S/N  ratios  of  the  individual  listeners 
(yU^CT-,  )»  those  of  groups  calculated  from  the  single  S/N  ratio  of 
individual  listeners  (yu3/0"g ) ,  and  those  of  groups  calculated  directly 
from  the  votes  of  individual  listeners  (yu,<r)  ,  (Sec.  2.41)  The 
evaluation  of  a  proper  cjj>  (x^/x)  is  performed  by  the  concept  of  the 
least  mean  square  error  defined  by 


Fig.  2.23 


In  this  equation  y(x^)  stands  for  the  percentage  of  listeners  voting 
for  signal  B  presented  with  the  S/N  ratio  of  x^  dB.  The  sum  is  taken 
over  all  m  presentations  of  a  certain  preference  test. 


The  parameters^  and cr  of  approximating  normal  distributions 
will  be  evaluated  by  minimizing  F(yu,(r).  For  that  purpose  an  iteration 
procedure  programmed  on  a  digital  computer  may  be  used.  The 
procedure  starts  with  an  approximate  value  for  the  desired  mean  value 
/J  .  The  approximate  value  jjla  should  meet  best  the  following  equation 

I  /(*,)  -  L  ['HM2=  0  <2- 

ial 

Additionally  an  approximate  value  for  the  desired  standard  deviation 
(T  has  to  be  assumed.  xk  shall  denote  the  highest  S/N  ratio  of  the  re¬ 
ference  where  all  votes  are  for  signal  A,  and  Xg  shall  denote  the 
lowest  S/N  ratio  of  the  reference  where  all  votes  are  .or  signal  B. 
Then  we  assume 


Xt  ~ 
A 


(2.  lu) 
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Experience  proved  the  practicality  of  Equ.  2.9  and  Equ.  2.  10.  They  give  for 
the  example  ihown  in  Fig.  2. 23  fa  =  3  dB  and  cr;  s  2  dB.  Now  we  take 

the  normal  distribution  ^  (xyu^Q;)  aa  a  first  approximation  and  evaluate 
the  mean  square  error  defined  by  Equ.  2.  8  for  the  given  test 

data.  Now  for  varying  the  parameters  of  take  the  following 

•  ight  couples  of  parameters  :  ' 


with  A,,s  1  dB  and  5^  =1.  After  calculating  the  eight  mean  square 
errors  for  these  distributions  they  are  compared  with  Fyu,,^) .  If 
one  of  them  is  smaller  than  it  is  called  F(yu2/^  and  fax 

and  are  used  for  the  next  iteration  step.  But  if  is  still  the 

smallest  mean  square  error  we  shall  vary  the  couple  of  parameters 
as  follows  : 

(/‘<±a8/tF  (/u,i^2/q;-e3r  *),  (/uv 

withAj^"-^-  and  o;  ■  and  repeat  the  above  procedure.  This  process 
converges  rather  rapidly  to  the  desired  parameters  yu  and  CT  .  The 
iteration  procedure  is  stopped  when  A  drops  below  2  ^  A ^  .  This 
corresponds  to  an  accuracy  lov  fa-  better  than  0,  i  dB.  For  the  actual 
numerical  processing  the  normal  distribution  function  ^t'Ou/vj')  is 

calculated  by  means  of  an  approximation  formula  <33  (x  ,us-)  which  is 

/  ‘ 

sufficiently  accurate  i’or  any  x  / 8/  with 


^  2  3  .  4,4 

M1  +  +  :.fa  +  ijz  t  a4z  ) 


2  3  4  4" 

2(1  +a.x+a,z  +a,x  +a.z  ) 
it  3  4 


z  S  0 


or 


at  =  o,278393 
a2  =  0,230389 
a3  =  0,000972 
a4  =  0,078108 


The  procedure  described  above  enables  us  to  fit  a  cumulative 
normal  distribution  to  data  which  consist  in  given  freq  .encies  ^  over 
S/N  ratios  .  Another  task  was  described  in  the  preceding  chapter, 
the  calculation  of  the  S/N  ratio  of  isopreferenceyUg  for  a  group  of  listeners 
from  the  for  each  listener  out  of  this  group.  These  fJii  can  also  be 
calculated  by  the  method  given  above.  The  mean  value  ^3can  be  calculated 
simply  by 


where  N  stands  for  the  number  of  listeners  in  the  group.  The  corresponding 
standard  deviation  is  found  by 


In  all  cases  discussed  so  far  statistical  data  were  evaluated,  i.  e 
many  or  at  least  some  similar  data  were  available.  Deyond  this  the  question 
may  arise  whether  one  can  find  the  point  of  isopreference  for  a  single 
listener  from  his  decisions  given  in  a  single  test  run.  Of  course  this  is 
a  rather  poor  basis  for  a  "statistical"  evaluation.  An  estimate  which 
proved  to  be  useful  may  be  found  in  the  following  way.  It  shall  be 
supposed  that  the  set  of  presented  reference  signals  has  equidistant 
S/N  ratios.  Some  examples  of  possible  decision  series  are  shown  in 
Fig.  2.24  , 
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Fig.  2.24 

The  point  of  isopreference  is  to  be  expected  to  lie  between  the  limits 
of  consistent  decisions.  The  mean  value  is  easily  found  in  series  I 
to  be  7.  5  dfl.  This  holds  as  well  for  series  II  because  of  the  symmetrical 
decisions.  It  is  rather  difficult  to  define  an  isopreference  level  for 
series  III.  The  proposed  solution  is  to  have  as  many  "displaced"  A 
decisions  as  "displaced"  3  decisions  on  either  side  of  the  thus 
determined  point  of  isopreference  M. 

The  described  Drocedures  have  been  programmed  in  FORTRAN 
for  processing  on  an  IBM  7040  digital  computer  system.  The  programs 
and  sample  printouts  are  contained  in  the  Appendix. 

The  input  data  for  these  programs  are  obtained  and  stored 
in  the  following  formats  :  the  decisions  given  bv  the  listeners  are 
stored  by  means  of  the  teletypewriter  as  sr.own  in  Fig  2.25.  The 
numbers  printed  at  the  left  side  give  the  respecuve  test  condition  by 
the  attenuator  setting  for  the  distortion  signal.  The  according  listeners  ’ 
decisions  are  printed  automatically  on  the  right.  The  numerals  1  or  2 
stand  for  a  decision  preferring  the  corresponding  signa  i  A  or  B.  Test 
conditions,  e  g.  signal  specifications,  date,  and  listener  names,  have 
to  be  written  by  the  operator.  The  decisions  are  ranked  ~r.d  prepared 
for  further  handling  by  the  test  operator  ,wr  lie  the  next  test  is  running. 
This  turred  out  to  be  very  useful,  because  one  can  control  the  listeners 
and  th '  presented  test  conditions  throughout  the  test.  The  form  used  for 
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this  purpose  is  shown  in  Fig.  2.26.  The  markers  in  this  tah.e  give 
the  reference  conditions  where  the  respective  listener  preferred 
signal  B  and  are  very  easy  to  survey.  Cards  are  punched  from  these 
data  in  a  format  which  is  specified  in  the  Appendix  and  fits  the  require 
ments  of  our  FORTRAN  programs. 
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this  purpose  is  shown  in  Fig.  2.26.  The  markers  in  this  table  give 
the  reference  conditions  where  the  respective  listener  preferred 
signal  B  and  are  very  easy  to  survey.  Cards  are  punched  from  these 
data  in  a  format  which  is  specified  in  the  Appendix  and  fits  the  require¬ 
ments  of  our  FORTRAN  programs. 
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3.  EXPERIMENTAL  EVALUATION  OF  PREFERENCE  TESTiM! 

3  .  1 _ Scope  of  Measurements 

In  this  section  under  the  subtitles  of  speech  quality, 
intelligibility,  loudness  and  human  factors  a  sequence  of  separate 
topics  will  be  treated.  We  have  not  tried  to  fit  all  those  topics  into 
one  large  picture,  because  we  felt  that  until  now  in  some  critical 
areas  we  do  not  have  enough  data  to  establish  final  statements. 

3.  11  Speech  Quality 

Repeated  preference  tests 

For  studying  the  behavior  of  listeners  preference  tests  were 
performed  under  equivalent  conditions  with  several  groups  of  listeners. 
Tests  were  repeated  during  a  session  with  different  sequences  of  the 
reference.  The  normal  vocoder  (VON)  was  used  as  test  signal  and  hili 
speech  signals  degraded  by  multiplicative  noise  (MULT)  or  by  additive 
noise  (ADD)  as  reference  signals. 

The  S/N  ratio  of  the  reference  signal  for  isopreference  will 
be  called  isopreferer.ee  level  in  the  following.  The  isoprefcrence  le.els 
were  determined  for  the  individual  listeners  by  analyzing  their  votes 
separately  for  all  tests  performed  during  one  test  session,  as  well  as 
those  for  each  group  of  listeners  at  each  test  run.  Further  points 
of  isopreference  for  single  listeners  and  for  groups  were  obtained 
by  evaluating  their  total  votes  during  one  session.  By  comparison 
of  these  isopreference  levels  calculated  from  the  group  data  and  from 
those  >  r  single  listeners  it  could  be  confirmed  that  the  relation  hold® 
between  the  standard  deviations  as  given  by  Equ.  2.  6. 

The  results  of  tests  performed  with  one  group  of  listeners  on 
one  day  (21-5-65)  shall  be  given  now.  Further  results  for  two  other 
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groups  of  listeners  are  given  in  the  Appendix. 

A  number  of  10  listeners  compared  the  normal  vocoder  (VOX) 
to  the  multiplicative  reference.  The  test  was  repeated  6  times  within 
a  two  hour  session. 

At  first  examples  shall  be  given  of  the  votes  of  the  10  listeners 
in  one  test  and  of  the  votes  of  a  single  listener  in  the  6  consecutive  test 
runs  of  the  session.  In  the  following  tables  the  S/N  ratios  of  the  reference 
signals  presented  are  listed  and  the  votes  of  listeners  favoring  the  re¬ 
ference  signal  are  marked.  The  listeners  are  named  by  capital  letters. 
The  complete  data  for  the  session  are  listed  in  the  Appendix. 
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Fig.  3.1 


£  rom  these  data  we  obtained  the  isopreference  levels  jj ,  for  ih<‘  . ; . :  . , 

listeners  and  the  standard  deviations  G“j  of  the  approximating  uori:..il 
distributions.  Examples  for  such  normal  distributions  are  given  in  the 
Appendix. 
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Table  3.  1 

A  schematic  diagram  of  these  results  is  shown  in  Fig.  3.  2. 
The  mean  square  of  the  standard  deviations  <r,  for  the  10  listeners  is 

found  to  be 


-  1,7b  G  li 

The  isopreference  levels  of  the  group  tor  eacn  test  run  are  listed  in 
Table  3.2  and  schematically  shown  in  Fig.  3.  3.  Examples  for  normal 
distributions  approximating  the  group  decisions  in  separate  test  runs 
are  given  in  the  Appendix. 
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The  isopreference  level  for  the  group  over  the  whole  session  was  not  only 
calculated  as  ^Jg  from  the  isoprefe  ence  levels  of  the  individual 
listeners  which  are  listed  as  u,  in  Table  3,  1  but  also  from  the  lump  sum 
of  all  decisions  in  the  session  denoted  as  jA  . 

*  ~  A .0  c)B  M  3  -  AA  oiB 

cr9 -  4.3^8  or*  2.3  JB 

Those  results  show  good  conformity.  From  the  standard  deviations 
and  <r  one  may  deduce  an  estimate  for  the  standard  deviation  tr"e^  of  the 
decisions  cf  individual  listeners  in  the  session.  We  obtain  an  estimate 
by  using  E qu.  2.6 


O’;  esc 


—  ^cr2-ffa  “ 

This  estimate  value  is  found  to  be  close  to  O';  =  1,75  as  calculated 

These  results  and  those  of  all  similar  tests  may  be-  summarised  :  the  - 
preference  levels  for  individual  listeners  vary  over  a  range  of  about  10 
The  respective  standard  deviations  were  all  around  2  dB.  At  the  same  time 
the  uncertainty  range  of  individual  listeners  was  also  found  to  be  about 
-  2  dB. 

A  typical  preference  test  session 

The  following  example  shows  the  results  from  a  complete 
test  session.  A  number  of  6  well  trained  listeners  made  judgements 
on  four  test  systems  HP.  LP,  VON,  and  V 02  in  comparison  with  the 
additive  and  multiplicative  reference.  The  Table  3.3  shows  the  mean 
value  M  of  the  single  listener  for  single  test  runs  and  in  the  last 
two  columns  the  mean  value  and  the  standard  deviation  cr  for  the 
whole  group  are  given. 
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DRV/ 


GROUP 


Mue 
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tests,  with  the  exception  of  the  listener  MUE  ail  others  have  the  erdi 
HP  -  LP  -  VON  -  V02  from  the  best  to  the  worst  signal  when  a.-,kod 
directly.  This  should  coincide  with  the  preference  results  expressed 
in  values  of  M. 


The  values  M  for  both  references  and  the  mean  values  ,U.  of 
the  group  are  plotted  in  Fig.  3.4.  The  monotonous  decrease  of  M 
for  the  first  three  listeners  shows  that  the  rank  order  is  the  same 

for  both  direct  and  preference  comparisons.  The  results  from  the  last 

i 

thhe'e  listeners  are  in  two  cases  with  the  additive  reference  and  in  one 
case  with  the  multiplicative  reference  contradicting  the  results  of 
the  direct  judgements.  The  mean  -'alucs  A of  the  group  are  in  all 
cases  in  the  same  order.  The  standard  deviations  CT  are  relatively 
small  with  an  average  value  of  about  1,  8  dB.  The  same  test  signals  H  -• 
LP,  VON  and  V02  were  also  judged  by  the  group  of  38  untrained  listen 
For  each  listener  the  isopreference  levels  regarding  each  o:  the  eight 
preference  judgements  performed  were  evaluated  and  their  distributor 
was  plotted.  Fig.  3.  5  shows  such  a  distribution  for  a  comparison  of 
VON  and  MULT. 
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These  distributions  can  be  characterized  by  moan  values  /J  and  stand, 
deviations  CT  derived  from  their  first  and  second  moments.  For  the  ei 


test  runs  in  this  connexion  the  values  P*  and  QTc j  are  given  in  T 
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he  mean  values  and  stu.'.c.ir: 
;a  are  shown  in  Fig.  3.7  on  the 
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the  .'i> , d i t i v e  and  the  multiplicative  refc  re  nco 


As  we  have  been  working  mainly  with  the  two  references 
ADD  and  MULT,  we  tried  to  find  a  correspondence  between  them. 

Tiie  relation  has  been  studied  for  its  reproducibility  not  only  wnen 
executing  the  same  test  on  different  days  and  with  different  listeners, 
but  also  when  different  test  procedures  are  utilized  for  its  determination. 


At  first  pair  comparison  tests  ‘were  made  with  both  signals 
varied  simultaneously  in  quality.  All  these  speech  pairs  consisting 
of  an  ADD  signal  with  any  S/N  ratio  compared  with  a  MULT  signal 
with  any  other  S/N  ratio  have  been  presented  to  the  listeners  in 
totally  random  order.  This  test  procedure  is  different  from  our 
normal  preference  test  procedure. 

The  rt  .  ■  *»  ■:  this  new  procedure  can  be  plotted  in  a  three 

dimensional  s >  .  <a.  n  with  the  two  horizontal  axis  numbered  in  S/N 
ratios  or  attenuator  settings  of  ADD  and  MULT,  and  along  the  third 
vertical  axis  percentages  of  the  listeners  are  given  who  prefer  ADD. 
This  plotting  procedure  would  yield  the  relation  between  the  two  signals 
as  a  three  dimensional  surface.  With  regards  to  the  difficulties  of 
using  a  three  dimensional  plotting  scheme,  it  was  decided  to  use 
only  two  dimensional  mapping.  The  presented  signal  pairs  correspond 
then  to  points  of  the  area  between  the  two  reference  axis  and  are 
labelled  with  the  respective  percentage  of  listeners  preferring  ADD. 

We  have  decided  in  view  of  this  mapping  to  call  the  new  test  procedure 
"area  te  st" . 


The  curve  of  intersection  between  the  three  dimensional  surface 
and  a  horizontal  plane  at  50  %  listeners  preference  represents  the 
" is onrefe  rence  curve"  of  the  two  signals  ADD  and  MULT  under  test. 
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In  a  real  test  there  will  be  !s  ome  "wrong"  decisions  in  the 
critical  range  close  to  the  isopreference  curve.  In  order  to  determine 
the  isopreference  curve  cross  cuts  perpendicular  to  the  ADD  and  MULT 
axis  of  the  data  surface  are  made.  In  these  ‘cross  cuts  the  isoprefc  reran 
level  discussed  together  with  a  standard  deviation  is  calculated  as  in 
Sec.  2.4.  The  desired  isopreference  curve  w-ill  then  be  found  as  an 

i 

empirical  approximation  to  those  individually  determined  isoprcferer.ee 
points. 

i 

As  an  example  the  results  of  such  an  area  test  shall  he  given 
which  has  been  conducted  of  10  listeners.  Fig.  3.8  shows  the  decisions 

for  the  listener  SCW  and  Fig.  3.9  shows  the  results  for  the  whole  groun. 

| 

Fig.  3.9  is  a  computer  printout  where  the  preferences  of  the  listeners 
are  mapped  in  the  respective  test  points  as  explained  before.  The  results 
of  the  evaluation  of  both  groups  of  distribution  functions,  or  cross  cut 
approximations,  as  explained  above,  are  plotted  ing  Fig.  3.  10.  The  mean 
values  of  both  groups  of  functions  are  discriminated  by  circles  and  triangles. 
The  isopreference  curve  approximates  these  points.  Fig.  3.  10  gives  also 
the  corresponding  standard  deviations  <T  of  the  distribution  functions. 

i 

The  reproducibility  of  these  measurements  is  demonstrated  in 
Fig.  3.11.  Five  such  area  tests  at  different  times  have  been  carried  oat.  The 
results  are  shown  as  curves  1  to  5.  The  deviations  are  so  small  that  v.v 
feel  entitled  to  call  the  mean  of  these  curve  s  the,  "standard"  isoproferer.ee 
curve  between  our  signals  ADD  and  MULT.  All  measured  isopreference 
curves  were  found  to  deviate  less  than  about  -  2  dB  from  the  standard 

curve  over  a  range  of  20  dB  of  the  multiplication  reference.  This  expected 
uncertainty  range  of  -  2  dB  is  indicated  by  dashed  lines  in  Fig.  3.  11.  When 
the  quality  of  the  test  signals  is  increasing  towards  hifi  quality,  the  de¬ 
viations  grow  larger. 

Parallel  to  the  area  tests  also  normal  preference  tests  have 
been  conducted  in  order  to  check  and  to  ve  rify  the  validity  of  the  standard 
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isopreference  curve.  The  two  kinds  of  preference  tests,  possible  in 
this  connection,  have  been  performed  and  repeated  :  the  additive 
reference  signal,  held  constant,  acting  therefore  as  the  test  signal 
and  the  multiplicative  reference  being  varied,  acting  therefore  as 
reference  signal  and  vice  versa.  The  corresponding  figures  Fig.  3.  12 
and  Fig.  3.  13  show  the  results  for  these  two  kinds  of  preference  tests. 
Mean  values  and  standard  deviations  for  groups  consisting  of  about 
8  listeners  are  given.  Both  kinds  of  tests  show  in  the  middle  range 
of  the  curve  approximately  the  same  values  which  are  increasing 
for  good  or  bad  speech  qualities.  All  isopreference  points  are  lying 
within  the  uncertainty  range  defined  above. 


Fig.  3.  13 


Area  tests  with  the  digital  reference 

A  few  weeks  ago  after  finishing  an  experimental  set-up  tnr 
the  generation  of  the  new  "digital  '  reference  DIG  (Sec.  2.21)  we  s :  a  rtiu. 
to  work  with  the  new  signal.  At  first  it  was  tried  to  establish  the  isn- 
prcference  relation  to  the  other  reference  signals  ADD  and  V. DDT. 

The  results  presented  here  are  still  preliminary,  but  fit  quite  well 
into  the  scope  of  our  data. 

Several  area  tests  were  made  in  the  same  way  as  descnui  c  .:. 
the  previous  subsection.  The  results  of  the  two  tests  with  the  si„r.ai  p.4;rs 
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DIG-ADD  and  DIG-MULT  are  shown  in  Fig.  3.  14  and  Fig.  3.  15.  The 
tests  have  been  conducted  twice  at  different  days  with  5  to  7  listeners. 


S/M  DK5 


S/rt  DIG 


Fig.  3.  14 


Fig.  3. 15 


The  given  standard  deviations  in  both  directions  for  the  isopreference 
points  seem  to  be  somewhat  larger  than  in  the  case  of  the  ADD-MULT 
relation  of  Fig.  3.  10,  but  should  be  confirmed  by  further  tests. 


A  combination  of  the  three  relations  ADD-DIG,  DIG-MULT, 
MULT-ADD  will  allow  to  get  information  about  the  transitivity  of 
preference  measurements.  This  will  be  discussed  in  Sec.  3.2. 


Area  tests  with  distorted  test  signals 

Normally  one  is  not  expected  to  be  interested  in  degrading 
the  test  signals  because  one  would  want  to  test  and  to  evaluate  signals 
as  they  are  in  practical  use.  But  we  used  this  easy  possibility  to 
extend  our  speech  material  in  order  to  get  new  dimensions  and  to 
prove  already  existing  relations. 
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Several  new  test  signals  have  been  generated  from  the  set 
of  our  test  tapes  and  have  been  used  for  area  tests.  The  results 
of  four  tests  with  the  new  test  signals  normal  vocoder  multiplied 
by  noise  VON-MULT,  normal  vocoder  plus  additive  noise  VON-ADD 
and  highpass  filtered  speech  plus  additive  noise  HP -ADD  in  comparison 
with  some  of  our  reference  signals  are  shown  in  Fig.  3.  16  -  3.  19. 

The  test  sessions  hav^  been  attended  by  7  to  10  listeners. 


4  S/M  MULf 


t  s/n  ADD 
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Obviously,  the  diagrams  have  to  show  that  for  very  high  S/X  ratios 
of  the  additional  distoritons  no  change  of  the  isopreference  level  of 
the  test  signal  can  be  expected.  But  it  is  astonishing  to  see,  e.  g.  in 
Fig.  3.  18  that  the  quality  of  our  vocoder  speech  signal  could  not  be 
impaired  by  adding  noise  down  to  6  dB  S/N  ratio. 

This  result  shows  that  the  influence  of  an  additional  degradation 
upon  an  already  degraded  speech  signal  may  be  difficult  to  predict. 
Effects  of  this  kind  will  deserve  further  study.  They  may  serve  to 
give  some  guidance  whether  it  will  be  worthwhile  to  improve  single 
properties  of  a  speech  processing  system  under  development.  A 
parallel  may  be  found  in  noise  control  work,  where  it  is  well  *.,.own 
and  easy  to  understand  that  isolated  changes  or  reductions  in  a  complex 
noise  signal  will  have  practically  no  effect  on  the  overall  loudness  of 
this  signal. 

Rank  order  tests 


Rank  order  tests  have  been  started  recently.  Questions 
as  the  follu'Ar  ing  shall  be  studied: 

The  ability  of  listeners  to  rank  order  systems  with  different  S/N  ratio, 

the  definition  of  difference  hmens  in  speech  quality, 

the  relation  between  speech  quality  and  S/N  ratio  of  a  speech  signal. 

Instead  of  signal  pairs  single  speech  signals  with  different 
S/N  ratios  are  presented  to  the  listeners.  At  first  the  listeners  he«r 
two  reference  signals,  one  with  very  good  and  the  other  with  very  bad 
quality.  Then  they  are  asked  to  rank  order  a  random  sequence  of  the 
above  speech  signals  between  the  first  two. 


At  first  two  tests  have  been  carried  out  for  DIG  as  a  variable 
speech  signal  with  groups  of  7  listeners.  The  quality  has  been  varied 


in  10  incremental  slept,  of  2  dB  S/N  ratio  covering  a  total  range  o; 

20  dB.  The  listeners  were  requested  to  classify  earh  of  nine  signals 
after  a  single  presentation  of  5  seconds  by  giving  marks  between  ore 
and  nine  to  these  signals,  while  mark  zero  for  the  best  and  mark  ter. 
for  the  worst  signal  had  been  previously  defined. 


Fig.  3.  20  Fig.  3.21 


The  results  of  the  "best”  and  the  "worst"  listener  are  given 
in  Fig.  3.20.  Nearly  every  listener  has  at  least  one  or  two  wrong 
decisions.  The  test  results  of  the  two  groups  with  7  and  5  listeners  are 
shown  in  Fig.  3.21. 

In  a  third  test  with  DIG  signal  and  5  listeners  the  incremental 
degradation  steps  have  been  reduced  to  1  dB.  The  total  range  remained 
again  20  dB.  Again  the  listeners  had  to  use  the  marks  one  to  nine 
for  their  classification.  The  results  of  this  test  as  mean  values  of  all 


5  listeners,  shown  in  Fig.  3.22  are  very  contused  and  indicate 
no  means  a  correct  rank  ordering  of  the  19  signals  into  9  positi 
I3y  plotting  the  first  half  ol  the  classification  of  the  presented  s: 


and  then  the  following  second  one,  two  ditferer.t  .urve*  are  obtain.... 
as  given  in  Fig.  a.  23.  While  the  curve  representing  the  first  half  v*: 
the  decisions  shows  the  expected  monotonous  decrease  similar  to  t.ne 
curves  in  Fig.  3.2  1  the  second  half  has  at  least  three  w  rung  values, 
is  not  m  correspondence  with  the  first  curve  and  is  therefore  the  reason 
for  the  confusing  shape  of  Fig.  3.22.  The  results  give  rise  to  the 
supposition  that  the  listeners  are  able  to  remember  the  initial  presented 
signals  with  the  extreme  qualities  only  during  a  limited  number  of 
presentations  ot  other  speech  signa's.  If,  as  in  the  present  exam, pic- 
more  than  10  different  signals  within  this  quality  range  are  presented. 
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or  if  the  set  of  10  signals  is  repeated,  the  listeners  get  more  .0:1- 
fused  in  their  judgements. 

These  first  tentative  tests  show  that  we  are  still  far  from  an 
answer  to  the  questions  given  above  and  that  for  the  understanding  of 
rank  order  tests  further  studies  will  be  necessary. 

Pulse  delta  modulation 


With  five  speech  signals  generated  by  a  pulse  delta  modulation 
system  with  sampling  frequencies  from  7,2  kc  to  120  kc  (PDl  -  PD5 
from  Sec.  3.22)  preference  tests  have  been  performed  with  a  number 


Fig.  3.24 


S/-  MU'.T 
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of  6  listeners  and  ADD  and  MULT  as  reference  signals.  The  mean 
values  and  deviations  c r  for  the  whole  group  are  given  in  Fig.  3.  24 
for  both  reference  signals  ADD  and  MULT.  The  results  show  the 
expected  rank  ordering.  It  may  be  interesting  to  note  that  all  the 
five  delta  modulation  signals  are  lying  not  on  but  just  beside  the 
standard  isopreference  curve  for  ADD-MULT  entirely  within  the 
uncertainty  range  of  -  2  dB  which  has  been  defined  before. 

Digital  reference 


Among  the  first  informative  studies  with  the  new  reference 
signal  DIG,  we  made  preference  tests  with  the  four  standard  test 
signals  V02,  VON,  LP  and  HP.  The  results  from  a  test  with 
5  listeners  are  given  in  Table  3.  5.  The  mean  values  M  of  the  single 


Listener 

Test  | 

SignO.1  | 

PA2 

P  ft  A 

- r 

STE| 

GftZ  I  MAR 

_ i _ 

GROUP 

M 

M 

- 

M 

L-  . 1 

M  ! 

M 

A1 

<T 

1 

VO  2 

25 

35 

25 

25 

>1.5 

2  5 

o.8 

von 

4.5 

4  5 

0.5 

2.5 

8.5 

39 

3.o 

LP 

4.5 

7.5 

IT) 

in 

45 

A  o.5 

G.3 

35 

HP 

55 

i - 

1  5.5 

7.5 

8.5 

MS 

8.o 

3/ 

Table  3.  5 


listener  decisions  are  listed  and  the  mean  values  fJ.  and  corresponding 
standard  deviations  <T  of  the  group  are  given  in  the  last  two  columns. 
The  mean  values  of  the  group  and  with  it  the  points  of  isopreference 
are  in  the  same  order  as  in  tests  with  the  ADD  and  MULT  refe.rnce. 
The  cr- values  seem  to  be  here  a  little  bit  larger  but  for  reliable 
statements  further  measurements  are  reeded. 


4 
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3.  1 2  Intelligibility 

Intelligibility  is  a  very  important  factor  of  the  overall  quality 
of  speech  signals  and  it  is  of  special  interest  therefore  to  study  all 
our  test  material  also  with  respect  to  this  parameter. 

It  has  been  already  mentioned  in  Sec.  2.  23  that  we  used 
monosyllabic  German  words  from  HAHLBROCKS  "Freiburger  Wttrter- 
teste"  for  our  intelligibility  tests.  These  lists  are  similar  to  the 
English  pb-word  lists.  Our  four  main  test  signals  have  been  measured 
with  two  different  groups  of  about  8  listeners  each  on  6  different  days. 
A  complete  test  list  included  about  100  words.  All  results  obtained 
have  been  averaged  and  are  given  in  % -intelligibility  in  Table  3.6 


lest  signal 

- n - ! - 1 -  . 

!  MP  :  LP  vtort  Vo  2 

i 

nielligib'ilitt 

[%]  89.2  :  *78.8  i  65  3.  1  69.2 

i.  i  i  1  .  .. 

Table  3 . 6 


In  contrast  to  the  results  from  preference  tests  where  VON  showed 
a  higher  isopreference  level  than  V02,  here  V02  has  an  intelligibility 
score  ot  about  69  %  and  VON  only  one  of  about  65  %.  The  different 
rank  orders  in  view  of  these  two  aspects  are  an  indication  that  pre¬ 
ference  judgements  can  only  partially  be  based  on  estimates  of  speech 
intelligibility. 

The  intelligibility  of  the  references  ADD  and  MULT  was  also 
measured.  At  different  S/N  ratios  of  ADD,  400  words  have  been  presented 
to  17  listeners  on  the  same  day  in  a  random  order.  The  results  are 
shown  in  Fig.  3.25.  A  smooth  curve  may  be  drawn  through  the  data 
points  which  starts  at  a  S/N  ratio  of  -  9  d3  with  a  word  intelligibility 
of  about  20  %  and  increases  monotonously. 
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A  intell.g.biLty  [%j 


S/M  ADD 


Fig.  3.25 


At  a  S/N  ratio  of  +  2  dB  the  intelligibility  score  reaches  about  90  % 
and  is  then  obviously  tending  to  100  %  for  higher  values  of  the  S/N 
ratio,  i.e.  for  nearly  undistorted  K if i  speech  signals. 

In  the  same  way  Fig.  3.26  shows  results  from  16  listeners 
on  only  one  day  for  the  signal  MULT.  Here  the  smoothed  curve  starts 
at  a  S/N  ratio  of  -  12  dB  with  the  already  high  value  of  more  than 
87  %  word  intelligibility. 

That  means  that  over  the  whole  useful  range  of  MULT,  the 
signal  is  highly  intelligible .  The  multiplicative  distortion  signal  alone, 
i.e,  hifi  speech  times  noise  was  found  to  have  88  %  word  intelligibility. 
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Fig.  3.26 


s 

f 


As  far  as  the  signal  DIG  i6  concerned  until  now  we  have  only 
a  few  values  for  the  distortion  signal  alone.  The  dependence  oi 
intelligibility  on  the  clock  frequency  has  already  been  shown  in  Fig.  2.  10 
The  results  given  in  Table  3.7  repeat  only  that  the  word  intelligibility 
is  as  low  as  about  32  %  at  a  clock  frequency  ol  4  kcps. 


ClocU  Frequency  [UcpW] 

'■mi 

1 

A 

Xntell\giioiU4y  E%] 

?<.5 

55 

32 

Table  3.7 

This  low  value  is  a  noticeable  advantage  for  the  purpose  of  degradation 
of  a  hifi  speech  signal  compared  with  the  high  value  of  88  %  of  the 
multiplicative  distortion  signal. 
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3.13 _ Loudne  s  s 

With  respect  to  our  pair  comparison  prefe  rence  testj.  there 
are  two  principal  loudness  measurement  problems  : 

The  determination  of  the  optimum  loudness,  i.e.  actually  the  optimum 
speech  level  for  the  presentation  of  a  certain  signal. 

The  S/N  ratio  of  a  reference  signal,  i.e.  the  loudness  or  level  relations 
of  a  speech  signal  and  its  accompanying  distortion  signals. 

Before  these  two  questions  can  be  addressed  directly  the 
intricate  problem  of  defining  and  discussing  of  terms  in  this  area  nas 
to  be  attacked  . 

Level  definitions 


In  loudness  and  level  problems  we  have  to  discriminate  between 

a)  Level  measurements  e.g.  optimum  speech  level,  or  S/N 
ratio  of  reference  signals 

b)  Level  adjustments  e.g.  quick  reproduction  of  a  certain  test 
condition. 


The  simplified  blockdiagram  in  Fig.  3.  27  of  the  test  set-up 
may  serve  for  the  following  explanation  of  the  check  points  and  the 
relations  between  the  respective  levels. 


Fig.  3.  27 


4. 
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ad  a)  Measurement  of  all  sound  pressure  levels  (SPL)  produced 
by  the  earphones  are  done  with  a  condenser  microphone  and  amplifier 
according  to  the  arrangement  described  in  Sec.  2.  34.  The  results 
obtained  from  these  measurements  are  given  in  dBA  and  v.  ill  b« 
referred  to  as  speech  level  (SL),  pilot  level  and  distortion  level  for 
the  corresponding  A-weighted  SPLs  of  speech  signals,  pilot  tones, 
or  distention  signals. 

The  subjective  determination  of  the  optimum  speech  level 
will  be  described  in  the  next  subsection.  The  measurement  of  the 
S/N  ratio  of  reference  signals  depends  on  the  nature  of  the  respective 
distortion  signals.  The  first  term  of  the  S/N  ratio 

S/N  =  speech  level  -  distortion  level 

is  the  speech  level,  which  can  be  measured  only  with  a  very  low 
accuracy  of  about  -  3  dBA.  The  level  of  the  additive  distortion  signal 
is  a  noise  level  and  therefore  measurable  with  sufficient  accuracy. 

The  distortion  level  of  both  multiplicative  references  MULT  and  DIG 
is  the  dBA  value  of  the  term  "speech  times  noise".  In  these  cases 
the  accuracy  of  level  measurements  would  be  also  very  low.  We 
therefore  used  a  pilot  tone  recorded  onto  the  same  tape  for  measuring 
more  accurately  the  level  of  a  fictive  distortion  signal  :  "pilot  tone 
times  noise"  instead  of  "speech  times  noise"  and  are  now  able  to  replace 
the  S/N  ratio  for  both  multiplicative  reference  signals  by 

S/N  =  pilot  level  -  fictive  distortion  level. 

This  ratio  is  of  course  measurable  with  a  much  higher  accuracy,  than 
the  other  one  using  the  speech  levels.  The  determination  of  the  S'  N 
ratio  in  this  way  much  more  precise- than  that  of  the  additive  reference 
signal.  This  is  another  important  advantage  of  the  multiplicative 
reference  signal  in  comparison  with  the  additive  one. 
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id  b  )  For  simple  and  quick  adjustments  of  any  level  during  a 
test  run  we  use  a  graphic  level  recorder  connected  to  the  transmission 
line  from  the  program  switch  to  the  headsets.  Level  readings  from 
this  recorder  are  given  in  dBLR  (level  recorder)  which  are  rms-voltage 
levels  referring  to  10  mV.  Whenever  possible  all  level  measurements  and 
adjustments  of  speech  or  distortion  signals  are  carried  out  with  the 
pilot  tone  and  the  fictive  distortion  signal  derived  from  this  pilot  tone 
utilizing  the  level  recorder.  The  justification  for  this  simplification 
is  based  on  the  fixed  relation  between  the  pilot  level  (dBA)  and  the 
level  of  the  pilot  tone  as  a  reading  on  the  level  recorder,  which  may 
be  called  sinus  level  (dBLR).  This  constant  difference  between  pilot 
level  (dBA)  and  sinus  level  (dBLR)  is  74  dB,  because  4  dBLR,  i.e.  20mV 
of  a  1  kcps  sinus  voltage,  produce  an  A-weighted  SPL  of  78  dBA  on  the 
earphones . 


The  relation  between  pilot  level  (dBA)  and  speech  level  (d3A) 
is  depending  upon  the  recording  conditions  and  therefore  is  different 
for  different  speech  signals.  Examples  for  our  hifi  speech  and  VON 
recordings  arc  given  in  Table  3.  8. 


Signal 

|  Sioci'S  Level 

Pilot  level 

SpeecV»  Level. 

|  dBUU 

dBA 

dBA 

M.C. 

78 

7/1 

VOH 

1 

84 

76 

Table  3.  8 


During  a  preference  test  these  speech  levels  are  held  constant,  but 
in  case  of  tests  concerning  optimum  loudness  it*  necessary  to  vary 
the  level  of  the  speech  signal.  The  speech  level  may  be  varied  by 
attenuator  1  (Fig.  3.27).  The  relation  between  the  speech  level, 
when  attenuator  1  is  set  to  zero  and  the  respective  attenuator  setting 
for  the  actually  desired  speech  level  SL  is  given  by  the  equation 


SL 


speech  level  (attenuator  =  0)  -  attenuator  setting 
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In  preference  testa  we  have  to  measure  S/N  ratios  in  terms  of  wn h : 
the  reference  quality  is  given.  The  S/N  ratio  of  a  reference  signal 

r{t)  =  s(t)  +  k.d(t) 

may  be  defined  by  its  levels  as 

S/N(level  definition)  =  speech  level(dBA)  -  distortion  level(aBA) 

The  value  of  the  constant  factor  k  and  thereby  the  distortion  level 
can  be  changeo  by  attenuator  2  (Fig.  3.27).  Assuming  that  for  k  1 
attenuator  2  is  set  to  zero,  one  can  derive  the  following  equation 

S/N  =  speech  level  -  distortion  level(attenuator  =  0)  r  attenuator 

setting 

Now  the  initial  two  questions  of  this  section  shall  be  discussed. 
Optimum  loudness 


In  our  version  of  the  isopreference  method  a**  speech  signals, 
i.e.  test  or  hifi  signals,  are  presented  to  the  listeners  in  a  setting  of 
optimum  loudness.  This  loudness  is  determined  separately  for  each 
signal  in  a  previous  subjective  test.  The  principle  of  this  simple 
loudness  comparison  is  the  same  as  for  preference  testing  and  consists 
in  the  alternate  presentation  of  the  same  speech  signal  only  with 
different  speech  levels  (SL).  The  listeners  now  have  to  choose  the 
samples  within  a  signal  pair  with  the  preferred  loudness.  A  simplified 
block  diagram  in  Fig.  3.  28  shows  the  test  set-up.  The  speech  signal 
is  transmitted  to  the  program  switch  over  two  separate  channels,  each 
containing  attenuator  and  amplifier.  The  program  switch  generates 
two  repeated  sample  pairs  ABAB  or  BABA  of  the  same  speech 
signal,  but  with  different  SL.  The  SL  of  sample  l,  is  below  that 
of  sample  A  by  a  constant  difference.  Over  a  suitable  loudness  range 
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Fig.  3.28 


with  incremental  steps,  of  1  d :i A  sample  A  i6  now  varied  in  a  random 
order.  Presentation  of  ABAB  and  BABA  is  randomized  also. 

We  found  that  c  dB  is  a  suitable  value  for  the  constant 
difference  between  A  and  :i.  A  smaller  loudness  difference  (4  dBA) 
yielded  too  many  wrong  decisions,  because  the  listeners  have 
difficulties  to  discriminate  the  loudness  of  both  speech  samples.  A 
larger  value  (8  dBA)  expands  the  uncertain  range  of  the  optimum 
loudness  value  and  reduces  therefore  the  accuracy  of  the  obtained 
re  suits . 


The  following  fictive  example  deals  witn  a  set  of  errorfree 
data,  which  might  have  gamed  from  an  ideal  listener  without  any 
"wrong11  decision  as  results  from  one  test  run.  The  speech  signal 
under  test  shall  really  have  an  optimum  loudness  which  is  assumed 
to  be  70.  5  dBA.  Then  the  listener  will  prefer  from  all  pairs  always 
the  speech  sample  with  the  4evel  closer  to  this  optimum.  Ideal  data 
of  this  kind  are  shown  in  Fig.  3.  29 


The  abscissa  is  numbered  in  SL  (dRA)  of  the  speech  samples,  the 
ordinate  gives  the  preferences  of  the  ideal  listener  at  this  SL. 

The  middle  of  the  resulting  "block"  formed  by  the  .iL  values 
which  have  oeen  preferred  two  times,  is  defined  now  to  be  the  optimum. 
SL  of  the  speech  signal.  Experience  shows  that  the  optimum  loudness 
of  a  signal  is  not  very  critical  and  one  should  speak  rather  of  an 
optimum  loudness  range. 

Table  3.9  and  Fig.  3.  30  show  as  an  example  the  re  .suits 
of  a  real  test  with  8  listeners  for  our  hifi  speech  signal.  Ail  pairs 
were  presented  twice  in  a  random  order,  one  has  therefore  at  least 
to  decisions  for  each  value  of  the  SL  The  evaluation  of  this  test 
yielded  an  optimum  SL  of  7  1  dBA.  This  SL  o:  7  1  dBA  corresponds 
to  77  dBA  pilot  level  and  4  dSLR  sinus  level  as  defined  earlier 


t 
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Besides  the  data  for  the  nifi  speech  signal,  the  optimum  speed.  ieve..~> 
of  three  other  systems  are  given  in  Table  3.  10.  Additionally  the  respective 
pilot  levels  and  sinus  levels  are  listed. 


SiQv%al 

Speech  Level  d BA 

Plol  level  dBA 

i 

<9;«*uia  Level  dSli 

M,C; 

7A 

78 

4 

VoH 

l  ^ 

64 

Ao 

UP 

7A 

8? 

AG 

LP 

... 

75 

84 

A4 

Table  3.10 


The  problem  of  level  definitions  in  normal  preference  tests 

The  idea  of  our  preference  method  is  to  measure  the  quality 
of  a  speech  signal  in  terms  of  the  S/N  ratio  of  a  compared  reference 
signal.  At  least  as  far  as  the  additive  reference  signal  is  concerned, 
the  determination  of  the  S/N  ratio  is  in  close  relation  with  the  measure¬ 
ment  of  the  loudness  of  speech. 

A)  long  as  there  exists  neither  a  close  definition  nor  an 
accurate  measuring  procedure  fo.*  the  speech  level,  one  has  at  least 
two  obvious  possibilities  to  define  the  S/N  ratio.  First,  one  may 
use,  e.g.  the  A-weighted  SPL  of  both  speech  and  distortion  signal, 
where  the  noise  level  is  exactly  measurable,  while  the  speech  level 
has  to  be  specially  defined  and  may  be  measured,  e.g.  as  we  postulate 
it  in  Sec.  2.  34.  The  other  possibility  is  tc  compare  speech  and  noise 
signals  directly  in  a  subjective  loudness  test  and  to  define  the  S/N 
ratio  to  be  zero,  when  speech  and  noise  signal  are  judged  to  be  equal 
in  loudness.  Our  experiments  described  in  this  subsection  shall 
illustrate  the  relation  between  speech  and  corresponding  distortion 
signal,  i.e.  between  both  of  the  S/N  ratios  mentioned  above.  The 
results  shall  show  whether  or  not  a  subjective  determination  of  the 
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S/N  ratio  is  useful.  As  long  as  there  is  no  better  definition  or  measuring 
procedure  proposed,  it  might  be  adopted  during  our  studies  in  order 
to  have  an  operational  definition  of  the  S/N  ratio  of  a  distorted  speech 
signal. 


We  made  experiments  to  compare  the  speech  signal  with  its 
corresponding  distortion  signal  in  a  loudness  pair  comparison  test, 
similar  to  the  method  described  above.  During  one  test  run,  speech 
signal  A  is  held  constant  at  a  certain  level,  while  distortion  signal  B 
is  presented  with  different  levels  in  a  random  order.  The  covered 
range  is  about  12  dBA  in  steps  of  1  dBA.  A  simplified  block  diagram 
of  the  test  set-up  is  given  in  Fig.  3.  31. 


Fig.  3.  31 


Speech  signal  and  distortion  signal  are  sampled  by  the  program  switch 
in  the  usual  ABAB  series  and  presented  over  headphones.  The  listeners 
are  requested  to  determine  from  each  repeated  pair  the  signal  with  the 
greater  loudness.  In  this  way  one  gets,  similar  as  in  a  preference  test 
the  point  of  isopreference  the  one  level  of  the  distortion  signal,  which 
is  found  to  be  equal  in  loudness  with  the  respective  speech  level.  Thus 
one  may  express  the  speech  level  in  terms  of  the  measurable  level  of  the 
distortion  signal  which  has  been  judged  to  be  equally  loud. 
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The  reproducibility  of  the  results  is  obviously  quite  well, 
the  values  of  6  are  entirely  within  the  range  of  the  usual  deviations 
which  have  to  be  expected  in  subjective  measurements. 

Fig.  3.  32  shows  an  approximately  linear  relation  between 
hifi  speech  and  pink  noise.  The  optimum  hifi  speech  le”el  of  7  1  dBA 
corresponds  to  a  pink  noise  level  of  about  66  dBA.  One  can  see  that 
for  the  additive  reference  signal  presented  with  optimum  hifi  speech 
level,  the  difference  between  the  two  definitions  of  the  S/N  ratio  given 
above  is  approximately  5  dB  or  written  as  an  equation  : 

ADD  :  S/N(loudness  definition)  =  S/N(spcech  level  definition)  -  5  dB 

Similar  measurements  carried  out  for  the  multiplicative  distortion 
signal  yielded  a  corresponding  value  of  74  dBA  judged  as  equal  loud 
to  7  1  dBA  hifi  speech  level  or  in  form  of  an  equation 

MULT  :  S/N  (loudness  definition)  =  S/N(speech  level  definition)  +  3  dB 

The  corresponding  values  for  the  digital  distortion  signal  yield  the 
relation  :  71  dBA  hifi  speech  level  is  judged  as  equal  loud  in  comparison 
with  75  dBA  or 

DIG  :  S/N(loudness  definition)  =  S/N(speecn  level  definition)  +  4  dB 
3.  14  Human  Factors 


Obviously  the  main  difficulty  when  running  subjective  tests 
with  a  group  of  listeners  is  the  influence  of  human  factors,  most  of 
which  may  differ  from  listener  to  listener  and  are  therefore  hard  to 
identify  and  to  consider.  Several  factors  such  as  momentary  personal 
condition,  disposition  or  motivation  may  be  statistically  different  for 
single  listeners  and  will  be  therefore  eliminated  to  some  extent  by 
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testing  a  group  of  listeners  and  forming  the  mean  values  of  their 
decisions.  Other  factors ,  e.g.  hysteresis,  check-on-order-effect, 
fatigue,  or  training  may  be  common  to  the  group.  If  possible  these 
factors  have  to  be  studied  separately  and  their  influence  on  the  test 
results  has  to  be  taken  into  consideration. 

Hysteresis  and  randomization 


To  determine  a  point  of  isopreference,  a  listener  compares  a 
fixed  test  signal  with  a  reference  signal  the  quality  of  which  is  varied. 

The  question  arises  how  the  reference  quality  in  a  range  about  the 
point  of  isopreference  should  be  varied.  Suppose,  one  would  improve 
the  reference  quality  step  by  step  from  a  value  which  is  certainly 
worse  than  the  isopreference  quality,  to  a  value  which  is  certainly 
better  than  th.s  quality.  Then  up  to  the  isopreference  level  an  ideal 
listener  would  have  only  decisions  which  favor  the  test  signal  and 
beyond  this  point  only  decision*  for  the  reference  signal.  Corresponding 
results  would  be  obtained  if  the  test  sequence  were  reversed  from  good 
to  bad  reference  quality.  The  listeners'  decisions  then  would  first 
favor  the  reference  signal  and  below  the  isopreference  level  favor  the  test 
signal. 


A  real  test  with  monotonously  increasing  (J  )  and  then 
decreasing  (*)  reference  quality  yields  different  results.  The  result* 
from  an  ideal  listener  in  comparison  with  the  decisions  of  real 
listeners  are  shown  in  Fig.  3.  33.  Being  confronted  with  pairs  consisting 
of  test  signal  B  and  monotonously  varied  reference  signal  A,  the 
listener  notices  at  a  certain  moment  that  he  has  to  change  his  decision* 
from  A  to  B  or  from  B  to  A  and  then  he  remains  with  the  new'  dec  ision. 
Disregarding  the  case  of  "ideal"  decisions,  the  point  where  the  listener 
changes  to  the  other  signal  will  be  different,  when  presenting  a  row  of 
pairs  with  monotonously  decreasing  and  increasing  reference  qualities. 
That  means  the  decisions  of  the  real  single  listener  show  a  certain 
hysteresis  which  may  have  maximum  values  of  4  -  6  dB  and  ave  age 
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values  of  2  -  3  di>.  Of  course  the  nysteresis  may  degenerate  to  zero, 
when  the  listener  has  a  chain  of  "ideal"  errorfree  decisions.  The 
hysteresis  may  be  explained  as  a  range  of  uncertainty  and  may  be 
caused  by  an  accomodation  during  a  single  test  series,  or  by  lack  of 
attention  of  the  listener.  The  hysteresis  effect  can  be  avoided  by  a 
randomization  of  the  test  material. 


Additional.  >■  studies  were  made  on  the  influence  o:  pr**.  .  ,.;ng 
o**i  lsions,  when  presenting  the  reference  qualities  randomly.  Dec 
•  n  the  critical  range  around  the  isoprefe  rence  level  have  been  marked 
whether  the  preceding  reference  signal  was  of  very  bad  quality  or  oi 
ver\  good  quality.  In  six  different  tests  we  counted  the  decisions  in  the 
critical  range  in  dependence  of  the  quality  of  the  preceding  reference 
signal.  The  results  show  that  presenting  the  reference  signal  in  a 
random  order  to  the  listeners  is  sufficient  to  make  the  decisions 
practically  independent  from  the  preceding  values  of  the  reference 
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Check-on-ordc  r  effect 

The  mode  of  presentation  of  the  test  material  has-  hce.» 
discussed  in  Sec.  2.  1.  Here  the  question  shaH  be  examined  whotx.i  . 
differences  in  the  results  have  to  be  expected  when  the  repeuU  < 
signal  pairs  ABAB  (abbreviated  AB)  are  reversed  to  BABA  (nn.u-wia 
BA).  Systematic  deferences  would  indicate  that  there  exists  a 
"check-on-order"  effect  which  means  that  the  listeners  have  the 
clear  tendency  to  be  more  influenced  by  the  last  speech  sail. pie. 

Two  different  cases  could  be  discriminated.  In  one  case  tne: 
are  negligible  differences  between  AB  and  BA  results,  in  tr.c  ot:.e. 
case  we  found  differences  which  are  by  no  means  negligible,  b... 
to  be  considered  or  may  perhaps  be  avoided  by  randomizing  nor  onl\ 
the  reference  quality  but  also  the  sequence  o:  presentation  . 


The  first  case  shall  be  illustrated  by  a  sequence  of  normal 
preference  tests  with  6  listeners  on  one  day  wit.i  VON  and  HP  as 
test  signals  and  ADD  and  MULT  as  reference  signals.  In  a  first 
test  run  the  mode  of  presentation  has  beer.  AB  and  in  a  second  r.  s. 
run  BA.  The  results  of  these  presentations  of  both  test  sequences 
AB  and  BA  are  given  in  Table  3.  11  .  Here  we  can  see  .hat  ...e 
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Table  3.  11 


differences  are  indeed  very  small.  An  explanation  ;or  li.is 

that  ir.  this  type  of  preference  test  the  test  signal  is  held  constant. 

During  a  test  run  the  listeners  seem  to  form  a  concept  oi  the  constant 
.uaiity  of  the  test  signal  in  their  mind.  With  this  concept,  r  :naae» 
nearly  no  difference  whether  this  test  signal  is  the  first  speed. 
or  the  second  one  of  a  pair.  Therefore  the  test  results  are  nearly 
the  same  in  the  two  cases  AI3  and  BA. 

The  second  case,  a  pronounced  check-on-order  effect  was 
found  to  be  possible  in  area  tests  between  the  reference  signals,  c.g. 

ADD  and  MULT.  This  test  is  described  in  detail  in  a  previous 
subsection.  In  a  first  test  run  we  presented  always  the  AB  sequence 
MULT-ADD  and  in  a  second  run  the  sequence  ADD-MULT  with  random 
variations  of  botn  speech  qualities.  The  results  of  two  different  test 
sessions  on  two  days  with  5  to  7  listeners  have  been  averaged  and 
are  shown  in  Fig.  3.  34  The  results  from  AB  and  BA  presentation 
differ  in  a  systematic  manner.  The  first  isopreference  relation 
MULT-ADD  corresponds  quite  good  with  the  standard  isopreference 
curve.  The  other  data  are  still  lying  within  the  previously  given 
uncertainty  range,  but  have  in  the  special  case  of  the  ADD-MULT 
relation  a  nearly  constant  difference  of  about  Z  dB  in  one  direction. 
Further  examples  with  random  presentations  of  AB  and  BA  pairs 
are  given  in  the  Appendix.  Both  isopreference  relations  ADD-DIG 
(and  reversely  DIG-ADD)  and  MULT-DIG  (DIG-MULT)  show  the  same 
tendency  of  favoring  the  speech  sample  presented  finally.  The  differences 
are  not  constant  as  in  the  case  of  ADD-MULT,  but  may  increase 
to  values  of  about  4-5  dB. 

The  examples  show  that  the  check-on-order  effect  may 
.iffcct  the  test  results,  if  the  speech  samples  are  not  presented  m  a 
random  sequence.  Our  explanation  is  that  in  area  tesis  there  is  no 
constant  test  signal  because  both  speech  signals  are  varied  randomly  in 
quality.  The  listeners  have  no  chance  to  form  a  stationary  concept  of 
the  test  speech  quality  and  are  then  actually  favoring  the  last  presentee 
speech  sample. 
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Fig.  3.  34 


Fa  tigue 

We  are  not  able  to  identify  a  systematic  variation  o:  test 
results  from  the  beginning  to  the  end  of  a  session  when  considering 
equal  tests.  The  scattering  of  the  results  and  the  reproducibility 

of  ?.  certain  test  is  quite  the  same  if  the  test  is  repeated  within  a 

or*  -  , 

test  session  on  the  following  day.  After  a  usual  test  session  ot  aoout 

2  hours  the  listeners  show  no  noticeable  fatigue  which  might  cause 

additional  variations  of  the  test  results. 
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The  dependence  on  time  of  the  measurement  results  obtaiiu 
is  a  further  question  concerning  human  factors.  Reproducible  resui: 
are  an  important  requirement  for  the  reliability  of  a  test  procedure 
The  reproducibility  is  influenced  by  several  subjective  factors  like 
training  and  learning,  accomodation  to  a  system  or  to  a  certain  test 
signal.  Of  course  there  is  also  an  influence  caused  by  the  inconstant 
of  the  decisions  of  a  single  listener  due  to  personal  conditions 
which  may  have  varied.  But  this  last  factor  cannot  be  taken  into 
account  for  it  is  different  from  one  listener  to  another  and  has  to  be 
eliminated  by  averaging  the  results  of  a  listener  group. 

Most  subjective  measure mentsi  require  a  certain  training 
of  the  testing  group,  so  that  the  listeners  get  well  experienced  with 
the  test  procedure.  Our  studies  have  shown  that  after  a  few  prefere: 
test  runs,  the  listeners  are  familiar  with  the  test  procedure.  Even 
the  first  results  of  preference  tests  yield  unambiguous  decisions  for 
most  of  the  listeners,  i.e.  decisions  preferring  test  signal  B  and 
reference  signal  A  are  clearly  separated  without  any  noticeable 
uncertainty  range.  Such  results  may  be  found  already  in  the  very 
first  test  run.  They  are  considered  to  be  a  striking  evidence  for 

i 

the  certainty  of  the  single  listener  in  making  his  decisions. 

*  i 

Many  of  the  listeners  make  again  unambiguous  decisions 
when  a  test  is  repeated  on  the  following  days  and  weeks.  But  the 
isopreference  level  for  single  listeners  as  well  as  for  the  whole 
group  may  be  somewhat  different  compared  with  those  of  previous  te 
sessions.  The  listeners  are  then  again  quite  sure  in  making  their 
decisions  but  they  must  have  changed  their  minds  about  one  or  both 
of  the  presented  speech  signals. 


Imr.s 


Considering  tin:  theoretical  case  that  tiie.se  vari.s 
monotonous  then  we  may  discriminate  different  possibilities, 
i.e.  whether  the  speech  signals  are  judged  to  be  better,  worse,  w 
equal  compared  with  the  previous  test  session  (Fig.  3.35). 


If  the  isopreference  level  increases,  this  may  be  due  to 
better  acceptation  of  the  test  signal  or  to  higher  annoyance  created 
by  the  reference  signal  in  subsequent  test  sessions.  It  is  still  an 
open  question  whether  it  can  be  decided  which  kind  of  such  ac c omodatio; 
effects  occurs. 


In  order  to  get  evidence  for  a  study  of  the  long  time  variability 
of  our  listeners  we  have  frequently  repeated  preference  tests  with 
the  four  standard  test  signals  and  the  reference  signals  ADD  and  MULT 
The  results  given  in  Figs.  3.  36  -  3.  39  cover  a  period  of  about  two 
months.  The  abscissa  is  numbered  in  test  sessions  which  are 
approximately  one  week  apart.  The  results  of  each  reference  signal 
with  the  four  test  signals  are  given  in  a  separate  diagram. 
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The  examples  show  that  the  establish  hi.:  of  confine!, 
criteria  for  single  listeners  and  for  groups  as  function  ox  time  a  no 
training  is  much  more  complex  than  anticipated.  The  accomodation 
to  the  test  material  is  not  the  same  for  the  different  speech  signais. 

This  is  true  for  the  single  listeners  as  well  as  for  the  group.  Testing 
group  2  shows  mainly  increasing  curves  while  group  1  has  a  slight 
tendency  to  have  lower  isopreference  levels  after  a  period  of  two  months. 
There  is  only  one  special  characteristic  common  to  all  results 
of  single  listeners  and  listener  groups  which  is  demonstrated  in  Fig.  3.40. 
At  the  beginning  of  the  tests  the  four  signals  have  been  found  to  be  clearly 
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Fig.  3.40 


discriminate.  HP  and  VO 2  systems  are  in  terms  of  the  reference 
system  6-10  dB  apart.  Over  a  period  of  2  -  3  months  about  30  tests 
have  been  executed.  Now  HP  and  V02  system  are  only  3  -  4  dB  apart. 

An  explanation  of  this  effect  is  that  the  listeners  get  more  and  more 
accustomed  to  the  frequently  presented  test  material.  This  reduction  of 
the  isopreference  level  differences  seems  to  be  an  effect  of  overtraining. 
It  is  interesting  to  see  that  in  spite  of  the  reduced  differences  the 
rank  order  of  the  signals  is  mostly  preserved. 
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-± _ Discussion  oi  Preliminary  Results 

In  the  previous  Section  3.  1  several  aspects  of  our  study 
have  been  discussed  in  some  detail.  Now,  in  this  section  wv  want 
to  summarize  the  main  positive  results  and  also  the  main  difficulties 
we  had  encountered  in  the  past  research  period.  The  section  is 
divided  into  three  parts  :  transitivity,  training,  and  intelligibility 
versus  preference.  The  first  part  is  positive  and  affirmative,  the 

other  two  parts  reveal  problems  which  will  necessitate  further  wor.s 

| 

before  final  statements  are  possible. 

Sf.  ; 
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T  rans  itivity 

Our  study  shall  establish  a  reliable  procedure  for  rating 
speech  signals  in  view  of  their  quality  in  terms  of  the  hypothetical 
one -dimensional  scale  for  the  parameter  "preference".  Transitivity 
is  a  basic  requirement  for  the  existence  of  a  one -dimens ional  rating 
scale.  Transitivity  exists  if  for  any  signal  pair  A  and  B  as  well 
as  for  the  signal  pair  B  and  C  which  have  both  been  found  to  be 

isoprefe  rent,  the  signal  pair  A  and  C  is  also  found  to  be  is  oprete  rent. 

! 

A  first  general  test  of  transitivity  is  possible  using  the 
standard  isopreference  curve  for  ADD-MULT  together  with  the 
isopreference  levels  for  four  standard  test  signals  expressed  in 
terms  of  the  reference  signals  ADD  and  MULT.  Fig.  3.41  shows  the 
isopreference  points  together  with  the  respective  6  values  of  the 
test  signals  HP,  LP,  VON  and  V02  in  an  ADD  versus  MULT  diagram 
for  a  trained  group.  A  transitivity  check  is  given  by  the  distances 
between  these  points  and  the  standard  isopreference  curve  plotted 
into  the  same  diagram.  All  four  points  lie, as  it  should  be  .within 
the  postulated  uncertainty  range  of  ^  2  dB. 
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Fig.  3.41 


As  a  comparison  to  the  results  of  the  small  trained  .-roup 
on  one  day.  the  isopreference  points  of  a  large  untrained  group  o: 
about  40  listeners  are  plotted  in  Fig.  3.42.  The  rank  order  of 
the  test  signals  is  with  both  reference  signals  the  same  as  for  the 
trained  group  and  corresponds  also  with  the  rank  order  of  the  signa; 
which  are  given  by  the  listeners  when  asked  directly.  Hot:; 

vocoder  signals  VON  and  V 02  lie  here  outside  the  uncertainty  r*r»go 
One  plausible  explanation  tor  this  effect  may  be  the  following.  At 
first  the  listeners  were  presented  all  tests  with  the  multiplicative 
reference.  The  unfamiliar  character  of  the  vocoder  signals  caused 
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very  low  isopreference  levels.  In  the  subsequent  tests  wit:;  l;;e 
additive  reference  the  listeners  were  already  somewhat  accustoiinn 
to  the  vocoder  signals  and  judged  them  to  be  better  than  before. 

At  the  moment  we  have  nc  other  test  data  to  verify  the  above  assure 


Two  further  transitivity  checks  can  be  made  using  tin 
isoprefe  rence  relations  determined  by  area  tests  m  Sec  3  ' 

r  ig.  3.43  shows  the  relations  between  the  three  reference  signals 
ADD,  MULT  and  DIG.  All  solid  curves  arc  direct  measurement 
results,  while  the  dashed  curve  ADD-MULT  is  derived  from  the 
two  isooreferer.ee  curves  DIG- ADD  and  DIG-MULT.  Tin-  coi-c. 


l-‘wr  respondi-nee  between  the  now  two  ADD-MULT  curve-.-,  «ic-i.-:v 
•‘j\  nitre rent  tests  votes  also  tor  the  transitivity  of  the  ret'e  re.-.c- 

s  1 ;  :ra  1  s  . 
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In  the  same  way  Fig.  3.44  shows  the  isopreference  .vmiiunr 
he tween  the  two  reference  signals  DIG  and  ADD  and  the  artificial  test 
signal  VON-MULT.  The;  solid  curves  again  are  determined  by  direct 
tests  while  the  dashed  curve  ADD-DIG  is  deduced  from  the  is  opre  fe  re  m 
curves  (VON-MULT)  -  DIG  and  (VON-MULT)  -  ADD.  The  corn: sponcien 

i 

oetween  the  two  curves  ADD-DIG  is  not  as  good  as  in  the  example 
given  above.  But  considering  the  still  unsufficient  data  which  we 
have  collected  for  the  digital  reference  signal,  the  dashed  curve 
seems  to  fit  with  a  reasonable  tolerance. 


It  has  been  already  pointed  out  that  a  great  part  of  our 

preference  measurements  have  been  concentrate:'  on  test  re  petitions 

with  the  four  standard  test  signals  HP,  LP,  VON  and  YQ2  and  the 

reference  signals  ADD  and  MULT.  The  results  discussed  in  a 

separate  part  of  Sec.  3.  14  cover  a  period  of  about  two  months 

and  show  for  a  single  listener  and  for  two  different  groups  the 

dependence  on  time  of  the  obtained  test  results.  The  examples 

_  1 

are  taken  from  five  different  groups  of  trained  listeners  each  consisting 

| 

of  about  8  listeners.  Now  disregarding  the  dependence  on  "time11 
the  mean  test  responses  of  the  single  groups  shall  be  calculated. 

These  tests  cover  a  period  of  about  7  months  during  which  time 
each  group  has  made  4-25  preference  tests  with  all  test  signals 
combined  with  both  reference  signals.  The  averaged  isopreference 
points  for  each  of  the  5  groups  and  the  four  test  signals  are  shown 
in  an  ADD  versus  MULT  diagram  in  Fig.  3. 45.  Additionally  the 
standard  isopreference  curve  with  its  uncertainty  range  can  be 
found  in  the  diagram.  The  points  are  widely  spread,  but  lie  with 
only  three  exceptions  within  the  given  uncertainty  range. 
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Ttu-  available  material  .shows  that  a  human  observer  h;.s 
a  subjective  yardstick  of  speech  quality  which  changes  "randomly” 
u.  time  and  which  may  be  seriously  affected  by  training.  \W  found 
only  very  few  refer'  ,.:es  in  the  bibliography  with  respect  to  the 
dependence  on  time  of  psycho-acoustic  measurements  in  general, 
although  this  effect might  have  an  influence' on  practically  all  sub¬ 
jective  notions.  In  standards  for  intelligibility  testing  the  necessary 
time  for  training  is  defined  by  the  period  after  which  the  test  scores 
have  reached  stationary  values.  But  based  upon  our  preference  test 
data  one  can  see  that  it  is  nearly  impossible  to  speak  at  any  time 
of  a  "steady  state"  of  the  test  results.  Therefore  here  we  cannot 
specify  a  time  after  which  an  untrained  listener  may  be  qualified 
as  being  trained.  Further  tests  will  be  required  to  clarify  this 
problem. 

intelligibility  and  luality 

Until  recently  intelligibility  was  the  only  aspect  of  speech 
quality  which  has  been  used  as  a  criterion  for  rating  speech  common  - 
ication  systems  in  view*  of  their  performance.  It  is  a  necessary 
part  for  the  characterization  of  speech  signals  but  does  not  give 
a  sufficient  description.  This  becomes  obvious  when  signals  with 
intelligibility  scores  close  to  100  %  are  compared.  We  try  to  describe 
the  speech  signal  in  terms  of  intelligibility  and  the  relative  quality 
measure  "preference". 


Intelligibility  and  preference  are  by  no  means  independent 
from  each  other,  for  the  first  term  seems  to  be  involved  in  whole 
or  in  part  in  the  second  one.  A  key  question  of  our  studies  turned 
out  to  be  the  relation  between  intelligibility  and  preference.  Based 
upon  a  body  of  yet  unsufficient  data  we  are  not  able  to  answer  the 
question  to  its  full  extent.  Furthermore  it  must  be  mentioned  that 


*  *  f 

t'  %  - 


we  studied  th-»  intelligi:  llity  of  isolated  words,  out  used  a  ci>:.'.  i.-.  ... 
text  for  preference  testing.  Some  available  results  are  shown  in 
Fig.  3.46.  The  diagram  shows  besides  the  standard  isoprefercr.o* 
curve  the  relation  between  both  reference  signals  w:th  regard  to 
equal  intelligibility.  This  "isointelligibility"  relation  has  been 
derived  from  the  measurements  in  Sec.  3.  12..  Additionally'  the 
results  of  preference  and  intelligibility  measurements  for  the 


J-S/tS  ADD 


Fig.  3.46 


test  signal  HP  are  given. 

Caused  by  the  very  high  intelligibility  of  the  multiplicative 
reference  signal,  the  points  of  isopreference  and  isointelligibility 
of  the  HP  signal  are  the  only  corresponding  points  now  available 
between  the  two  curves.  Even  from  the  present  insufficient  da:,, 
significant  differences  between  the  isopreferent  and  isointelligible 
signals  may  be  presumed.  Further  studies  are  necessary. 


v  ,  OllC  i  U  S  1  I  'IIS 


The  following  list  of  brief  statements  refers  to  the  .*c,r.< 

of  the  past  and  also  to  the  future  research  period.  It  tries  to  nr...- 

wnere  we  stand  at  the  moment. 

a)  Preference  tests  according  to  the  proposed  method  art- 

feasible  . 

A  close  correlation  between  two  types  of  reference  signals 
could  be  established  namely  hifi  distorted  by  adding  noise, 
and  anoth crone  by  multiplying  the  speech  signal  with  noise. 

c)  Measurements  of  test  signals  with  the  two  reference  signals 
give  comparable  results.  Transitivity  of  preference  judgements 
could  be  proved  to  exist  with  reasonable  tolerances. 

d)  Standardization  of  a  method  for  preference  testing  seems 
to  be  possible,  though  we  could  not  yet  identify  an  "ideal" 
reference  signal. 

e)  The  establishing  of  confidence  criteria  for  single  listeners 
and  for  groups  as  function  of  time  and  training  turned  out  to 
be  more  complex  than  anticipated. 

f)  Intelligibility  is  not  simply  related  to  preference.  These 
two  criteria  may  give  different  rank  orders  for  a  set  of 
speech  signals. 

g)  The  results  of  the  past  period  encourage  us  to  continue 
the  subject  study  along  the  same  general  lines. 
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Listener  Room 


Operator  Room 


Listener  Set 


INST  RL'.V.  E  NT  AT  ION 


-  A  S 


REVOX  T.i:-o  recorder,  model  G  3c 

7  1,  2  (3  3/4}"  per  sec.  5  N  >  S3  d 1 3 
Frequency  response  40  -  18  000  cps  -0 

Me  INTOSH  Powi? r  amplifier  MC  225 

tvvin-ampiitTc r  :  25  watts  continuous  per  c 
18  -  30  000  cps  +0/  -0,1  dB 


KOSS 


Professional  headphones  ^odel  No.  PRO 
30  -  20  000  cps  impedance  50  ohms 


BRUEL  ii  KJAER  Random  noise  generator  type  1402 

20  -  20  000  cps  linear  or  -  3  cl  3  oct  wei 
external  ‘Titer 


BRUEL  U  KJAER  Tar.ci  p.. ss  filter  set 

weightin'  .  a  1  a  o  r  k  A ,  u* ,  u. , 


~...  a  no 


*  1  o  m  x ■*  h"  R.  J  A.  c  *  v.' .  .oiccr  i.c  -  a  > 

with  iocarithmi ;  ooter.tii  meter  5^  di 


; :  r»  t  *  t”* 


~  61  :c  r .  ;>:.onc  amp:;:;er  type 

Cathode  follower  type  2 o'. 4 
Condense r  microphone  ty  >  4.  > 


HEW  LETT  -  PACT  ARD  Attenuator  set 

v  *■  1  v v  c w  cl/o  C' » - rr, s 


3  5^  D 


Random  Pulse  Generator  (continue d) 
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PROGRAM  SWIT C I-I 


I  * 

The  program  switch  serves  to  get  the  desired  mode  of 

i  •  j  jl 

signal  presentation.  Bloch  diagram  and  corresponding  time  plans 

are  shown  on  the  next  page.  The  duration  of  presentation  T  is 

\i  s 

equal  for  both  speech  signals  A  and  B  and  is  given  by  the  formula  : 


Tg  =  Ta  -  0,5  (sec)  j 

■  } 

where  T  is  the  period  of  the  astable  multivibrator  aMV.  The  ear- 
3l  |  ; 

phones  are  cut  off  for  0,  5  seconds  whenever*  speech  signals  are 

’  i  •• 

switched  over.  This  short  pause  corresponds  to  the  period  of  the 
monostable  multivibrator  mM V ^  which  causes  the  pict-up  of  the 
pause  relay  over  circuit  "logic  II".  The  switch-over  of  the  speech 
signals  is  done  by  the  signal  relay  which  is  cont  rolled  by  the  bistable 

I 

multivibrator  b\lV^,  The  counting  circuit  bMV „  and  bMV^  counts 
the  presented  signal  pairs.  Eligibly  after  one,  two  or  three  presented 
signal  pairs  the  circuit  "logic  I"  is  able  to  excite  bMV^  which  produces 
tiie  reset  of  aMV.,  bMV.,  bMV.,,  and  bMV,  and  over  logic  II  the 
cut-off  of  the  earphones  by  the  pause  relay.  After  a  starter  impulse 
the  flipping  bMV^  opens  aMV  to  bMV^  and  the  process  is  repeated. 
Switching  of  S  enables  an  automatic  control  of  the  pause  duration  by 
mM  V  2  ■  •  Three  small  laiyips  "A",  "B",  and  "PAUSE"  are  controlled 
by  the  signal  relay  and  the  pause  relay  and  indicate  the  program  run 
optically. 


The  following  modes  of  presentation  are  possible  : 

1.  One,  two  or  three  successive  presentations  of  the  signal  pair  A-B 

Z.  The  duration  T  of  presenting  the  speech  signals  A  and  B  is 

equal  but  can  be  varied  from  2  to  15  seconds. 

3.  The  duration  of  the  long  pause  between  two  repeated  signal 

pairs  ABAB  can  be  varied  from  4  to  20  seconds. 
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cst  sessions  on  Intelligibility 
cst  sessions  on  Loudness 


SC  Test  sessions  2  hours  each  8  listeners  in  average 
1300  listener  hours 
Approximately  64  000  decisions. 


:  _sp  of  tr.e  •■■oirinute  r  .!  -rograrn  Pill  APPROX  1 

•  i 

I  j 

Premises  :  j 

:**ii  -ge  ;*  valued  levels  and  attenuator  settings  have  to  be  user-, 
attenuator  setting  must  not  exceed  *10  dBl 

fhe  speech  level  has  to  be  fixed  throughout  the  test  run. 

Reference  signals  have  to  be  taken  from ja  close  series  of  rc 
having  incremental  steps  of  1  dB  in  their  S/N  ratios. 

Data  cards  have  to  be  punched  in  the  format  giver,  i 
thus  including  all  information  for  one  listener  during  a  sing! 
Cards  to  be  evaluated  together  have  to  succeed  one  another 
forming  a  "block  of  data  cards".  End  of  a  block  of  data  care; 
by  a  card  which  is  blank  at  column  80.  T:he  first  block  of  da 
has  to  be  preceded  by  a  card  having  punched  the  printout  sy; 
O,  blank,  i,  2  at  the  columns  1 , 2 ,  3 , 4,  5  jrespec tively .  The  : 
I'ucnt  of  cards  can  be  seen  from  the  following  figure. 


Bt-Ariv<  CAUt) 


SSCOHD  C'-OCk  Or  D/VT>>/1 


BeArAUC  CACiP 


vtc-sT  Block  :ov  "oata 


ZZZ  t 


I 


M.  O  A  2. 


/  Su  e-  Pftoc,?An  <apt-u 


'SiaFTC  OcCK  2 


I'-AAlrs  tPRoG.R.A.f'i 
PHlAPP'iOy 


Format  of  Data  Cards 


De  sc  ription 

Decisions  of  listener 
"1"  for  A  and  "2"  for  B 
Column  number  corresponds  to  the 
respective  attenuator  setting  of  the 
distortion  signal 

Number  of  listener 
Name  of  listener 

Number  of  test  run  attended  totally 
by  the  listener 

Number  of  test  run  with  this  test 
signal  attended  by  the  listener 

Date  of  test  session  DAY.  MONTH, 
(YEAR  -  1900) 

Time  of  test  session  (HOUR) 

No.  of  test  run  in  this  session 
Type  of  reference  signal 
Level  of  hill  speech  signal 
Level  of  distortion  signal 
Type  of  test  signal 
Level  of  test  signal 
Declaration  of  listener  card 


Name  Format 


ND(J) 

4011 

1  through  40 

J  =  1,40 

NR 

13 

41.42.43 

NAME 

A3 

44, 45, 46 

ITESTO 

13 

47.48.49 

ITESTYS 

13 

50.51.52 

IDATE 

16 

53  through  58 

ITIME 

12 

59.60 

ITESDA 

12 

61.62 

REF TYP 

A3 

63.64.65 

LEVHIF 

12 

66,67 

LEVDIS 

12 

68.69 

TESTYP 

A3 

70.71.72 

LEVTES 

12 

73,74 

LISTQU 

:i 

80 

ftrP*e*»rf 
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SCATE  09/17/65 

$ JCB  PHlAPPRQX  PACHL 

STIME  300 

SIBJQ8 

SI8FTC  CECK1 

02PI=l./(8.*ATAN(  1.  )  ) 

N1D2PI  =  SCRT(C2PI ) 

C1MENSICN  STNR( AO  ) , K STNR ( AO  ) 

DIMENSION  NO (AO) ,XCECI 6C ) , NOE ( 6C ) , TCADD( 60 ) , SUMDEC ( 60 ) • X( 60 ) 

01  MENS  I CN  Y(60) , Z ( 60 ) , GFH ( 60 ) ,SPH(60) ,  0 1 F F { 60  )  , QR < 60 ) , RR ( 60 ) 

Cl  HENS  I  ON  QMY(60) , Q  S I (60) ,RMY(6C),RSI (60) 

01  HENS  I CN  APPRCX(60)»ERRCR(60)»IERR(60)»SQER(60) 
t 

C  READING  PRINTOUT  SYHBOLS 
C  FIRST  DATACARD  NUST  BE  «0  12 
C 

REAC751 » STAR , OH, BLANK. E INS » 2 WE  I 
751  FORH A  T ( 5  A 1 ) 

C 

C  PREPARATION  OF  FORH 

c 

1  PRINT? 

2  FORHAT ( 32H1  HEASUREHENTS  OF  SPEECH  QUALITY) 

PRINTS 

5  FORHAT ( 92H - 20.  -15.  -10.  -5.  0.  5.  10.  15.  20.  25.  30. 

1  35.  AO.  CB  SIGNAL-TC-NOISE  RATIO) 

PRINT6 

6  FORHAT!  1H- , 67X , A8HL I  ST ENER  TEST-NUKBER  DATE  TIME  TESTSYST  REFSYST 
1) 

PRINT7 

7  rORHAT ( 1H  ,  67X  *  20HNR  NAME  VOT  SYS  CAY, 12X.27HTYPE  LEV  TYPE  LEVHlF 
l  LEVCIS) 

CC3  J-1,60 

3  SUNCEC (  J  I  =0. 

PRIMA 

A  FORMAT ( 1H  ) 

C 

C  REAC1NG  CATA 

r 

11  RE AC  12, (NC( J), J=1,A0),NR,NAME, ITESTO, I TE SYS . I OATE , I T I  MS » I TE SDA , 
1REFTYP,LEVHIF,LEVDIS,TESTYP, LEVIES, LIST QU 

12  FORHAT (AG  1 1, 1  3, A3, 13, 1  3,16, 1 2, 1  2, A3, 12, 1 2, A3, !2,5X,!1) 
IFCLISTCU.NE.l  ) GO  TO  1 0  I 

C 

C  CALCULATION  CF  S I CN AL- TC-NO I SE  RATIOS 

C 

CC 1  3  J  =  1  *  AO 

STNR( J)=lEVHIF-LEVDIS+J 

13  KSTNR( J)=STNR( J)»21. 

CCIA  J-1,60 

1A  .WCEC(J)=8LANK 
CC  l  5  J  =  1 ,  AO 
K=KSTNR<  J  ) 

IF(NC( JJ.NE.l  )GCTC16 


Co;npiiU-r  Program  for  Data  Evaluation 


1  *rf 


’RT' 


non 


A  30 


XCEC ( K ) *E INS 
GCTC15 

16  IFINCI J).N£.2)GGTC15 
XOEC IK  > * Z W E  I 

15  CONTINUE 

PR1NT17,<XDEC<K ),K»1,60>,NR,NAHE, ITESTO.ITESVS, ITESDA,IDATE,ir!HC, 
ITESTYP,LEVTES,REFTYP,LEVHIF,LEVC1S 

17  FCRPATI1H  ,4X,60A1,I6,2X,A3,2I4.I3,I7,I3,3H00  ,A3,  15, 2X, 43, 217) 

PREPARING  DATA  FOR  COMPUTATION 
NP*60 

CIPENS1CN  NCL I  ST ( 60 ) *  XNCL 1 SI60 1 
0021  J=l,60 

21  TCACD(J)*0 
0022  Jz 1 »  40 
K*KSTNR ( J  J 

IFCNCI J).EQ.C)CCTC22 
T0AC0(K)«N0(  JM999 

22  CONTINUE 
0023  J- 1 » 60 

23  SUNCECI  JHSUPCECI  JMTOACCI  J) 

GCTC11 

101  CONTINUE 
C01C2  Jz l *60 

NCLISTU)  =  SUPCEC(  J)/100C. 

102  XNOL IS<J)— NCLIST(J) 

C01C3  J=1 *60 

PF*  J 

IF(N0LIST*J)»NE.0)GCT01C5 

103  CONTINUE 

105  CONTINUE 
CC1C6  J=1,6G 
K*6 1- J 

PL  *K 

IFINCLIST(K).NE.0)GGT01C7 

106  CONTINUE 

107  continue 

PFP*PF-l 

IFIPFP.LT.l IGCTC109 
C01C8  J  =  l » NFP 

108  Y I J  )  =0 

109  C0110  JzMF  *  PL 

110  Y(  J)*<SUPCEC(  Jl-IC00.«XNCLIS(JH/XN0LISU) 

PLP*PL*i 

IF(PLP.GT.60)GOT0113 
0C112  J=MLP , 60 

112  Y ( J  )  *  1 

113  00114  J - l *  60 

114  X(J)«J-2i 


Computer  Program  for  Data  Evaluation 


nnn 


C 

C  TEST  NETHER  QCOOllll 
C 

CC42  J-1,60 

JA-J 

1F(YU).NE.Q.)G0T043 

42  CONTINUE 

43  CC44  J* JA , 60 
IF(Y(J).NE.l. )G0T045 

44  CONTINUE 
GCTC46 

45  GCTC48 

46  XNY*FL0ATUA)-21.5 
SIC-O. 

PRIM4 

PRINT  302, XHY 
PRINT4 

PR  I NT303 , S  IG 
PRINT47 

47  FORPAT ( 10H-NC  ERROR  ) 

GCTCi 

48  CONTINUE 
C 

C  ALREACY  CONPUTEC  XU),  YU),  J-l.NP 
C 

JA=  I 

401  CONTINUE 
SUPY-0 
SUNY=0 

00402  J=1 , JA 
SUPY=SUPY*Y(J)«*2 

402  JB s JA4 1 
00403  J-J8.60 

403  SUNY=SUNY»( l.-Y(J) )»*2 
01 SCRI=SUPY-SUNY 
IFICISCRI.GE.O. ) G0TC404 
JA= JA* 1 

GGTCtOl 

404  XPY  =  X (JA ) 

C04C6  J=  1 , 60 

IF(Y( J) .NE.O. JGCT0407 

406  X 1  =  X ( J  > 

407  CC408  J  =  1 , 60 
K*61-J 

IF(Y(K).NE*1.)GCTC409 

408  X2-XIK) 

409  SIG*( X2-XI )/4. 

204  IFtSIG.GT.O. )G0T0  2C5 
SIGM. 

205  CONTINUE 

XPY  ,  S  I G  ALREADY  CCMPUTEC 

Computer  Program  for  Data  Evaluation 


n  n  n  non 


-  A  32  - 


STARTING  POINT  FOR  ITERATION 

CINENSICK  APP(60)*ERSQER(3»3) » XMYDI 3) .SILOI 3) ,SIGO( 3J 
CIPENSICN  0VFA( 3»3) 

LAST*0 
DELTA* l • 

700  SU*ALCG<$IC> 

IFILAST.N6.DGOY07O1 
DEL  T  A  =  CELTA/2. 

701  0C7C2  J»l,3 
A J= J-2 

XPYC< J)=XPY*AJ*OELTA 
SILC(J)*SIL*AJ»CELTA 

702  SIGC(J)*EXP(SILD(J) ) 

CC7C3  L  =  1 » 3 

007C3  P=l,3 
ERSCER ( L  »P ) =0 
D07C3  J=PF,PL 

APP ( J ) =GPH I (  X  ( J  ) » XHYO ( L ) » S IGDI M ) t W1C2PI  ) 

703  ERSCER(L,P>  =  ERSCER{L,MM(Y<  JJ-APPl  J))»*2 
KI  =  2 

K2  =  2 
GCTC7C4 
707  CONTINUE 
K 1*  L 1 
K2  =  L2 

704  DC7C5  L*l#  3 
CC7C5  P*l,3 

CYFA(K1,K2)*ERSCER(K1,K2) -6RSQER IltM) 

CYFF=CY.-A(K1,K2) 

11  =  1 
L2  =  P 

IFICYFF.GT.O. JGCTC707 

705  CONTINUE 
XPY  =  XPYCIK1  ) 

SIG  =  SIGC (K2 ) 

IF(CELTA.LT.C.0nG0T073l 

IFIK2.NE.2IGCT07G6 

IF(K1,NE*2)GCT0706 

LAST»1 

GCTC7C0 

706  LAS T*0 
G0TC7C0 

731  CONTINUE 

BEST  PY  ANO  SIGPA  ARE  EVALUATED 
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PRINTOUT  OF  RESULTS 
PRIM  4 

301  PRIM302.XHY 

302  F0RPAT!21H  PE  AN  VALUE  -.F6.2.3H  OB) 

PR1M4 

PRIM303,SIG 

303  FORPAT ( 21H  STANCARO  DEVIATION  «,F6.2,3H  OB) 

304  CC3C5  J ■ 1 » 60 

APPROX ( J  >  >GPH I ( X I J),XPY,SIG,W102PI ) 

ERROR! J)«Y( J)-APPROX( J) 

305  IERR! J)-100.«ERR0R( J)«0.5 
PRIM4 

PRINT 306*! IERR(J)»J*  1*30) 

PRINT306.UERR!  J),  J*31,60) 

306  FOR PAT !7H/ERR0R*»30I4) 

PRINT4 

SPE-O. 

D03C7  J-1,60 
SQER! JI-ERRCR! J)*»2 

307  SPE*SP£*SQER(J)/20. 

PRINT308.SME 

308  FORPAT (21H/PEAN  SQUARE  ERROR  »,F12.10) 
G0TC1 

END 
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SIBFTC  0ECK2 

CALCULATION  CF  CUMULATIVE  NORMAL  DISTRIBUTION 

FUNCTION  CPHIIA'B.C'O) 

2*1 A-8 ) /C 
IFI2.GT.5. )CCT03 
IF(2.LT.(-5.I»GCT04 
2*2/SCRT (2. > 

ZET*Z 
2*ABS(Z  ) 

Al-O. 278393 
A2*C. 230389 
A3*0.C00972 
A4*0. 078108 

XNEN* (  (  (A<i*Z*A3)*Z*A2)*Z*Al)*Z*l. 

XNEN*XNEN**4 
ERF  I = 1 1 . /XNEN 
IFtZEJ.GE.O.lCOTOl 
GPHl*0.5-ERFI/2. 

CCTC2 

1  GPHl=C.5*ERFI/2. 

2  CONTINUE 
GCTC5 

3  CPH I  =  l 
GCTC5 

4  GPH  1*0 

5  CONTINUE 
RETURN 
ENC 

SENTRY 

•  0  12 

11111211111212222222  022DRK02500421056L>1201HAH7188VON84  1 

li  111111112222222222  0100RW0430082105651201HAM7188VON84  1 

11111111212222222222  CC2MUE04901 32105651201HAM7188VQN84  1 

11111111112222222222  005SVE0490132105651201HAM7188VON84  1 

llllllllir  2222222  C04SPA0430082 1056512  IHAM7188VON84  l 

11111111111111222222  011KLE013C0321056512  IHAM7188VON84  1 

11111111122222222222  G09BER04300821056512  1MAM7188VCN84  1 

11111221122222222222  C07PRA04901321056512  1HAM7188VON84  1 

1111 1222222222222222  001LER049C 1321056512  1HAM7188VON84  1 

1  111  1111212222222222  008ZLA043C0821056512  1MAM7188V0N84  1 
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